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A NEW SENSITIVE METHOD FOR QUANTIFYING 
ACTIVE TRANSFORMING GROWTH FACTOR- BETA 
AND COMPOSITIONS THEREFOR 

5 Technical Field 

The present invention relates to a sensitive assay method 
for quantifying the amount of active transforming growth factor 
beta (TGF-S) and vector compositions for use therein for 
expressing an indicator molecule in response to TGF-S 

0 - activation of a TGF-E response element in the vector. 

Background 

Transforming growth factor beta, hereinafter referred to 
as TGF-£, is a 25 kilodalton (kD) homodimeric protein that 

5 belongs to a family of regulators of cell growth and 

differentiation that includes activins, inhibins, Mullerian 
inhibiting substance, the Drosophila decapentaplegic complex 
and bone morphogenic proteins. For review, see, Massague, Ann . 
Rev. Cell Biol , . 6:597-641 (1990); Roberts et al . , In Peptide 

0 Growth Factors and Their Receptors, Spom et al., Eds.., 

Springer-Verlag, Berlin, i:419-472 (1990); and Hoffman, Curr . 
Ooin. Cell Biol , . 3:947-952 (1991). TGF-£ was initially 
defined by its ability to induce morphological transformation 
of fibroblastic cells in monolayer culture and stimulation of 

5 colony formation in soft agar. Delarco et al., Proc. Natl. 

Acad. Sc:>.. 75:4001-4005 (1978) and Todaro et al., Proc. 

Natl. Ara o. .Sci . . USA . 77:5258-5262 (1980). 

Three distinct molecular isoforms of TGF-S, the genes of 
which are located on different chromosomes, have been 

0 identified in mammals and are designated TGF-S1, TGF-S2 and 

TGF-S3. Derynck et al . , Nature . 316:701-705 (1985); Hanks et 
al., Proc. Natl. Acad. Sci.. USA . 85:71-72 (1988); andMadisen 
et al., DNA , 7:1-8 (1988). Each of the isoforms are first: 
synthesized as high molecular weight latent or inactive 

5 precursor polypeptides that are then processed to 12.5 kD 



WO 95/19987 



PCT/US95/01I53 



' 2 " <~ 

monomers. Activation "of the latent complex can occur through a 
variety of physiochemical or enzymatic treatments as well as in 
various tissue culture systems. For review, see Barnard et 
alw BiQChim. EiOPhVS, ftrffl ,, 1032:79-87 (1990) . Two processed 
monomers then dimerize to form biologically active TGF-fi. 

The activation process must occur to allow binding of the 
dimerized TGF-& to the high affinity TGF-& receptors expressed 
on the surfaces of all normal cells and most all neoplastic 
cells. Tucker et al . , Proc. NaM . Arad , SCJ - USA , 81:6757- 
6761 (1984); Frolik et al., J. Biol . rh^m . 259:10995-11000 
(1984); Pircher et al., Biochem. Bionhvs. R es. Commtm , . 136:30- 
37 (1986). 

Although some TGF-E activation systems generate the mature 
TGF-fi in nanogram quantities, the majority liberate picogram 
amounts. These low. concentrations, however, are sufficient to 
induce a variety of biological responses such as macrophage 
chemotaxis (Wahl et al., Proc. waM , Acad. £ci , , us* . 84:5788- 
5792 (1987)), inhibition of endothelial cell migration and 
■proliferation (Heimark et al., Science . 233:1078-1080 (1986)), 
stimulation of extracellular matrix deposition (Ignotz et al . , 
J ■ Biol , Chpm , 261:4337-4345 (1986)) and decreased plasminogen 
activator. (PA) activity as a result of decreased PA production 
(Laiho et al., J. Cell Rini , 103:2403-2410 (1986) and 
Flaumenhaft et al., J , Cell , Phv<nnl 152:48-55 (1992) ). along 
with increased secretion of its inhibitor, plasminogen 
activator inhibitor-1 (PAI-1) (Laiho et al . , J. Biol . chem . 
262:17467-17474 (1987)). 

PAI-1 is the primary inhibitor of both tissue-type 
plasminogen activator (t-PA) and urokinase- type plasminogen - 
activator (u-PA) , and as such is a potent anti -fibrinolytic, 
molecule. PAI-1 synthesis by cultured cells in vitro is 
induced by a variety of molecules including cytokines, growth 
factors, hormones, and other agents such as endotoxin and 
phorbol myristate acetate. Nuclear transcription run-on assays 
demonstrate that the regulation of PAI-1 by many of these 
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agents, including TGF-B, occurs primarily at the level of 
transcription . 

TGF-B released from platelets may be an important negative 
regulator . of the fibrinolytic system of the vessel wall since 
the TGF-B in releasates of thrombin-activated platelets causes 
large increases in PAI-1 synthesis by endothelial cells. This 
increased PAI-1 synthesis may account for the resistance of 
platelet -rich thrombi to thrombolytic therapy. The 
accumulation of PAI-1 in the extracellular matrix in response 
to TGF-S protects matrix proteins from proteolytic degradation. 
Thus, the induction of PAI-1 by TGF-B may also play a role in 
both wound healing and fibrotic responses. 

These and other biological effects of TGF-6 activity have 
been used to develop a variety of semiquantitative and 
quantitative bioassays including those based on chondrogenesis , 
inhibition of DNA synthesis and cell growth, differentiation, 
migration or PA activity. Differentiation-based assays include 
the induction of cartilage specific proteoglycan expression 
(ED50 = 5 ng/ml; 200 pM) (Ogawa et al . , in Peptide Growth 
Factors, Barnes et al . , Eds, Academic Press Inc., 198:317-327 
(1991); Seyedin et al., Proc , Na M , Acad, Sci . . USA . 82:2267- 
2271 (1985)) and inhibition of rat L6 myoblast differentiation 
(ED50 = 0.2 ng/ml; 8 pM) (Florini et al . , J. Biol . Chgm. . 
261:16509-16513 (1986)). An ED50 represents the half -maximal 
amount of factor required to produce an effect, activation or 
inhibition, on differentiation of target cells. The 
abbreviations ng/ml, pg/ml, nM and pM respectively stand for 
nanograms/milliliter, picograms /milliliter , nanomolar and 
picomolar. These assays are utilized primarily for studying 
differentiation rather than for quantification of TGF-S. 

Assays based on TGF-B 's ability to inhibit DNA synthesis 
and cell growth in mink lung epithelial cells (MLE cells) (ED50 
= 10-20 pg/ml; 0.4-0:8 pM) (Lucas et al., In Peptide Growth- 
Factors, Barnes et al . , Eds, Academic Press Inc. 198.:303-316 
(1991) and Danielpour et al . , J. Cell. Phvsiol . . 138:79-86 
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(1989)), African green monkey kidney epithelial cells (ED50 = 1 
ng/ml; 40 pM) (Holley et al., Proc. Natl. Acad. Sci . . USA , 
77:5989-5992 (1980)), rat hepatocytes (ED50 = 0.4 ng/ml;16 pM) 
(Nakamura et al., Biochem. Bionhvs. Res. Comm. . 133:1042-1050 
(1985)), and fetal bovine heart endothelial cells (ED50 = 75- 
125 pg/ml; 3-5 pM) (Qian et al., Proc. Natl. Arad. sH , . USA. 
89:6290-6294 (1992)) are sensitive but can be affected by a 
variety of molecules such as insulin, EGF, PDGF, and bFGF , 

Migration and plasminogen activator (PA) activity assays 
have also been described. The migration of bovine aortic 
endothelial cells (BAEs) into a denuded area of a monolayer is 
inhibited by TGF-S (ED 50 - 2 ug/ml; 80 pM: sensitivity 10-20 
pg/xnl; 0.4-0.8 pM) (Sato et al., J. Cell Biol . . 107:1199-1205 
(1988); Sato et al., J. Cell Biol.. 109:309-315 (1989); and 
Sato et al., J, Cell Bin! , , 111:757-763 (1990). Migration of 
BAEs, however, can be simultaneously stimulated by endogenously 
or exogenously supplied bFGF that can abrogate TGF-fc's 
inhibitory effect (Sato et al . , J , Cell Biol . . 107:1199-1205 . 
(1988)) . The PA assay for measurement of TGF-S concentration 
is very sensitive and rapid (Flaumenhaft et al . , J. Cell. 
PhYSiol , , 152:48-55 (1992)). The assay is based on the ability 
of TGF-fi to decrease PA activity of BAEs by inhibiting PA 
synthesis and secretion and by inducing expression of its 
inhibitor, PAI-1. This assay, however, is also sensitive to 
other molecules, such as bFGF, that can alter PA activity 
(Flaumenhaft et al., J. Cell . Phvsiol . . 152:48-55 (1992) and 
Sato et al., J, Cell Biol 107 :1199-1205' (1988)). The ED50 of 
the assay varies from 1 to 35 pg/ml (0.04-1.4 pM) of TGF-fi 
depending on differences in basal PA levels and sensitivity to 
TGF-E among primary BAE cultures. 

The ability of TGF-& to stimulate PAI-1 expression has 
recently been used to study TGF-fi receptors. Wrana et al . , 
Call,. 71:1003-1014 (1992) transiently transfected a PAI-1 
luciferase construct together with a human type II TGF-fc 
receptor expression vector into TGF-S resistant MLE cells. 
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This lucif erase construct contained a short, synthetic TGF-E 
response element based on the human PAI-1 promoter and was used 
to report functional expression of the receptor. Although only 
used to screen transfected mutant cell lines, this construct 
appeared to be less sensitive to TGF-S than the- constructs of 
this invention when transiently transfected into MLE cells,' and 
no information was reported regarding its dose-responsiveness 
or specificity. 

In another study of the TGF-S-stimulation of PAI-1 
expression, Riccio et al., Mol , Oil pio],,, 12:1846-1855 
(1992), transiently transfected TGF-S responsive cells with 
constructs containing varying regions of the 5 -flanking domain 
of the human PAI-1 gene to determine the transcription 
regulatory mechanism used by TGF-S. All the constructs 
contained the gene encoding the enzyme chloramphenicol ' 
acetyltransferase to provide for an indirect determination of 
the transcriptional effect of the various constructs, with 
this approach, a 67 base pair region that contained binding 
sites for the two proteins, CCAAT-binding transcription factor- 
nuclear family I family and USF factor. Both sites were 
necessary to obtain TGF-S induction. The constructs, however, 
were not utilized in assays to determine dose-responsiveness 
nor measure the amount of TGF-S. in a sample. 

The most specific assays for TGF-E are the radioreceptor, 
radioimmunoassay (RIA) , and enzyme-linked immunosorbent assay 
(ELISA) . Radioreceptor assays using a variety of cell types, 
such as A549 human lung carcinomas and murine AKR-213, have 
been described and have ranges of 125 P M/ml to 25 ng/ml (5 pM-1 
nM) with EDso of approximately 0.5 ng/ml (20 pM) . See 
30 Wakefield et al., J, CHI pjol , 105:965-975 (1987); Sato et 
al., J . -CftH BiflU , 111:757-763 (1990); Lucas et al . , in- 
Peptide Growth Factors, Barnes et al.. Eds, Academic" Press Inc 
198:303-316 (1991) and O'Connor-McCourt et al., j. Bim 
262:14090-14099 (1987). RiAs specific for TGF-E1 and S2 have ' 
ED50s of 12 and 37 pM. respectively (Danielpour et al., j p P n 
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BaaiQi-:. 138:79-86 (1989)). Others, using different 
antibodies, describe the range of TGF-S1 specific RIAs to be 
6.25-200 ng/ml (0.25-8 nM) . with a sensitivity of 2.4 ng/ml 
(0.1 nM) (Lucas et al., m Peptide Growth Factors,. Barnes et 
al., Eds, Academic Press Inc. 198:303-316 (1991!) As 
demonstrated by the differences in these results, 'the 
affxnities of the antibodies can greatly alter the sensitivity 
of the assay. r 

.Isoform-specific double antibody or sandwich -ELISAs 
(SELISA, are also very sensitive to the affinities of the 10 
antibodies. One such assay, using two different monoclonal 
antibodies specific for TGF-E1, had a useful range of 0 . 63 to 
40 ng/ml (0.025-16 nM, (Lucas et al.. m Peptide Growth- 

15 n«in' Barn6S ^ al " EdS ' AC3demiC PrSSS Inc - 198:303-316 

til l ' U ! 9 3 COmbination of isoform-specific turkey and 15 
TnlT ' DanielP °- 6t al " *■ Sby^J , 138:79- 

< (20-50 pg/ml; 0 8-2 p M ) . Although highly sensitive and 
specific, SELISAs such as these are not readily available and 
are expensive. 
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Although all of these other TGF-£ assays can detect nature 
TGF-fi, the low concentrations «2 pM) generated in various 
biological systems make many of them impractical without prior 
concentration of the sample. This can result in large losses 
or the mature growth factor or more importantly activation of 25 
latent TGF-S. Moreover, many of the assays are complicated to 
establish and can be influenced by other factors present in the 
samples thus reducing their utility for accurating measuring 
the amount of TGF-S in the sample. For -this reason, a need 
exxsts for a relatively simple, sensitive and nonconfounding 30 
assay for TGF-E. 

Brief Pps—ir,t- inn „f rhp t^,^^ . 

A highly sensitive and specific, non-radioactive assay 
for mature (active, TGF-S has now been developed. When ' 35 
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compared to the sensitive and widely used proliferation-based 
MLEC method for measuring TGF-8 concentration, the TGF-8 assay 
method of this invention is more rapid, has comparable 
sensitivity, and has a greater detection range. Specificity of 
this novel assay was also higher as evidenced .by its relative 
insensitivity to factors such as EGF and bFGF which can greatly 
affect other assays. The use of a truncated PAI-1 promoter 
that does not respond to other growth modulators such as PDGF 
found in biological samples, the method of this invention can 
be used in conditions where other bioassays are difficult to 
interpret. Because of its large range and specificity, the 
rapid, sensitive, non-radioactive, easily performed assay 
method of this invention is useful in determining active TGF-S 
concentrations in complex solutions. 

Thus, the present invention overcomes the limitations of 
existing methods used to quantify the amount of TGF-S in a 
liquid sample. This invention contemplates a method for 
quantifying the amount of TGF-S in a sample using a system 
comprising a TGF-6 responsive cell containing ah expression 
vector having a regulatory region comprising a TGF-S response 
element opera tively linked to a promoter and having a 
structural region encoding an indicator molecule. Following 
TGF-S induced activation of the TGF-S response element, 
transcription, results in the expression of an indicator 
25 molecule, the amount of which allows for the measurement of the 
amount of TGF-8 responsible for the induced activation. 

In particular, in one embodiment of the invention 
contemplates a method for quantifying the amount of TGF-S in a 
liquid sample, which method comprises: 
50 (a) incubating the liquid sample together with eucaryotic 

cells that contain a TGF-S responsive expression vector having . 
a gene encoding luciferase for a predetermined time period 
sufficient for the eucaryotic cells to express a detectable 
amount of the luciferase; 
35 {t> ) measuring the amount of the luciferase expressed 
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during the time period; and 

(c) determining the amount of TGF-S present in the sample 
by comparing the measured amount of the luciferase against a 
reference curve. 

The invention further contemplates that the reference 
curve represents a quantitative relationship derived from a 
series of measured amounts of luciferase produced from a series 
of known concentrations of TGF-S. 

Another embodiment of the invention contemplates a method 
for quantifying the amount of transforming growth factor-G 
(TGF-S) in a liquid sample comprising: 

(a) providing, in eucaryotic cells capable of expressing 
an indicator molecule, a plasmid comprising, in the direction 
of transcription, a regulatory region that includes at least 
one TGF-S inducible response element that is operatively linked . 
to a promoter, and a structural region downstream of the 
promoter, where the response element is capable of. inducing 
dose-dependent indicator molecule activity and where the 
structural region codes for the indicator molecule; 

(b) incubating the liquid sample with the eucaryotic 
cells for a predetermined time period sufficient for the 
eucaryotic cells to express a detectable amount of the 
indicator molecule; 

(c) measuring the amount of the indicator molecule 
expressed during the time period; and 

(d) comparing the measured amount of the indicator 
molecule produced in step (c) with the amount of indicator 
molecule produced in a control assay performed according to 
steps (a) through (c) by treating the liquid sample with an 
anti-TGF-S antibody to obtain a net measured amount of the 
indicator molecule induced by TGF-&. 

Contemplated for use with the methods of this invention 
are plasmids having identifying characteristics of plasmids on 
deposit with ATCC having the ATCC Accession Numbers 75627, 
75628 and 75629. Also contemplated are stably transformed 
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eucaryotic cells that contain the TGF-S response element having 
the nucleotide sequence in SEQ ID NO 11 where the cells 
correspond to cells on deposit with ATCC having the ATCC 
Accession Number CRL 11508. 

The invention describes plasmids for use in the methods 
that comprise a nucleotide sequence corresponding to nucleotide 
sequences listed in SEQ ID NOs 1-10. ■ TGF-S inducible response 
elements that comprise a nucleotide sequence corresponding to 
nucleotide sequences listed in SEQ ID NOs 11-17 are also 
described. Contemplated promoter nucleotide sequences are 
listed in SEQ ID NOs 18 and 19. 

A further embodiment of the methods of the invention are 
eucaryotic cells that are stably transformed cells containing a 
plasmid having a gene encoding a selectable marker for the 
selection of said stably transformed cells. The invention 
describes such plasmids having nucleotide sequences listed in 
SEQ ID NOs 1-6. The invention further describes a stably 
. transformed eucaryotic cell on deposit with ATCC having ATCC 
Accession Number CRL 11508 containing the TGF-E response 
element having the nucleotide sequence in SEQ ID NO 11. 

An additional embodiment are eucaryotic cells that are 
transiently transformed cells with plasmids corresponding to 
the nucleotide sequences listed in SEQ ID NOs 7-10. 

The invention describes quantifying the amount of TGF-S in 
a body fluid, in culture medium, and in a tissue extract. A 
further preferred embodiment is the determination of the amount 
of a specific isoform of TGF-fi, specif ically . TGF-fil , TGF-&2 or 
TGF-S3,. in a liquid sample. 

In a preferred embodiment, this invention describes the 
use of mammalian cells.' Preferred mammalian cells include mink 
■lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 
Hep3B cells, GM7373 cells, and NIH.3T3 cells. 

A preferred indicator molecule also described for use with 
the methods of this invention is a chemi luminescent molecule, 
preferably lucif erase. 
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The invention describes a composition of a plasmid vector 
in capable of causing expression of an indicator molecule in a 
eucaryotic cell, where. the plasmid contains nucleotide 
sequences comprising a regulatory region that -includes at least 
5 one TGF-& inducible response element operatively linked to a 
promoter, a structural region downstream of said promoter and 
coding for said indicator molecule, and a gene encoding a " 
selectable marker for the selection of a stably transformed 
cell, where the response element is capable of inducing dose- 
10. dependent lucif erase activity. 

In preferred embodiments, plasmids with selectable marker 
genes have the nucleotide sequences corresponding to SEQ ID NOs 
1-6. Preferred TGF-fi inducible response elements for use in 
the expression vectors of this invention have the nucleotide 
15 • sequences corresponding to SEQ ID NOs 11-17 . 

A further preferred embodiment of the expression vectors 
of this invention is the use of the neomycin gene for selecting 
stable transf ormants, the nucleotide sequence of which is. 
listed in SEQ ID NO 20. 
20 The invention further describes plasmids lacking a 

selectable marker gene having the identifying characteristics 
of plasmid ATCC Accession Numbers 75627, 75628, 75629,. 
corresponding to SEQ ID NOs 8-10, respectively. 

The invention describes a eucaryotic cell containing a 
25 plasmid having a nucleotide sequence listed in SEQ ID NOs 1-10. 

Kits useful in assaying the amount of TGF-S in a liquid 
sample comprising (a) packaging material; (b) eucaryotic cells 
capable of expressing an indicator molecule and containing a 
plaismid of this invention and an aliquot of TGF-5, where the. 
30 latter is used for generating a reference curve. 

Other embodiments will be apparent to one skilled in the 

art . 
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Brief Descrinrion of th e Drawings 

Figure 1 shows the structure and construction of the 
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p800neoLuc expression vector, p800Luc was digested with AccI 
and blunt-ended. pMAMneo was then digested with Sal I and Eco 
RI, blunt-ended, and the fragment containing the neomycin- 
resistance gene (neo r ) was ligated to the linearized p800Luc to 
form p800neoLuc. Clones were analyzed via restriction enzyme 
mapping and one clone with the proper insert was selected. 
(MCS, multiple, cloning site; PA1, 2, 3, polyadenylation regions 
1, 2, and 3). The details of the construction are described in 
Example 1A. 

Figure 2A, having an inset (Figure 2B) , shows the dose- 
dependent induction of the plasminogen activator inhibitor- 
1/lucif erase (PAI/L) construct in p800neoLuc expression vector 
in stably transformed MLE cells by TGF-fil, TGF-S2 , and TGF-E3 . 
The TGF-S assay was performed as described in Example 3 with 
DMEM-BSA containing the indicated concentrations in picomoles* 
(pM) of recombinant (r) TGF-S1 (closed squares ) , TGF-£2 (closed 
circles), or TGF-&3 (closed triangles) on the X-axis. The 
amount of expressed luciferase detected by a luminometer is 
plotted on the Y-axis and is expressed in relative light units 
(RLU) . ■ The results shown in Figures 2A, 2B and 2C are 
described in Example 3B. Figure 2B shows the treatment of 
p800neoLuc-transformed MLE. cells with all three TGF-& isoforms 
in a TGF-S assay that resulted in a linear dose-response over 
the range of 0 to 4 pM of TGF-E . In Figure. 2C, the TGF-E assay 
was performed with 8 pM rTGF-El , TGF-E2 or TGF-E3 in DMEM-BSA 
in the presence (cross-hatched bars) or absence (open bars) of 
100 ug/ml of anti-TGF-E/ TGF-E2 and TGF-E3 monoclonal antibody. 
Baseline induction is indicated by medium alone (filled bars) . 

Figures 3A, 3B, 3C and 3D show the effects of medium, cell 
density and incubation time on sensitivity of the TGF-E assay 
as described in Example 3B with the amount of TGF-E1 plotted on 
the X-axis in pM against the measured RLU on the Y-axis. In 
Figure 3A, the assay was performed with increasing rTGF-Sl 
concentrations in DMEM (closed squares), alpha-MEM (closed 
circles), CMEM (closed triangles: Eagles MEM supplemented with 
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non-essential amino acids) or RPMI-1640 (closed diamonds: Bio- 
Whittaker) . All media contained 0.1% BSA. In Figure 3B, 
increasing concentrations of rTGF-fcl in DMEM, 0.1% BSA were 
measured using 3.2 x 10 4 (closed squares), 1.6 x 10 4 (closed 
5 circles), or 0.8 x 10 4 -(closed triangles) clone 32 (C32) of 

mink lung epithelial cells/well (MLE cells) after a three hour 
attachment period. Samples were incubated with the cells for 
14 hours prior to assaying for lucif erase activity. In Figures 
3C and 3D (an inset in Figure 3C) , 1.6 x 10 4 C32 cells were 

10 allowed to attach for 3 hours prior to addition of the 
indicated- concentrations of rTGF-£l The samples were 
incubated for 6 (closed squares) , 14 (closed circles) , or 22 
(closed triangles) hours prior to assaying for lucif erase 
activity. The results are described in Example 3B. 

15 Figures 4A and 4B show the effects of. growth factors on 

the TGF-£ assay and MLEC assay while Figure 4C shows the 
effects caused by serum. For all figures, either the growth 
factors or TGF-& are plotted on the X-axis against the RLU on 
the Y-axis. In Figure 4A, the TGF-fc assays were performed with 

20 DMEM- BSA containing the indicated concentrations of rTGF-£l 
(closed squares), recombinant human bFGF (closed circles), 
recombinant IL-lalpha (closed triangles), recombinant PDGF-BB 
(closed diamonds), or EGF (open squares). In Figure 4B, TGF-fi 
assays were performed with DMEM- BSA containing 1 pM rTGF-Sl 

25 (closed squares) and the indicated concentrations of 

recombinant human bFGF (closed circles), recombinant IL-lalpha 
(closed triangles), recombinant PDGF (closed triangles), or EGF 
(open squares) . The assays and results are described in. 
Example 3C. In Figure 4C, TGF-S assays were performed with 

30 DMEM- BSA containing the indicated concentrations of rTGF-El 
alone (closed squares) or with 0.5% (closed circles), 1% 
(closed triangles), or 2% (closed diamonds) calf serum. The 
assays and results are described in Example 3D. 

Figure 5 shows the comparison of CMs assayed by the TGF-fc 

35 (shown as the PAI/L assay) and MLEC assays. DMEM BSA (closed 
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squares), COS <X-marked lines), BSM (closed triangles) or BAE 
(closed circles) cell conditioned medium (CM) with the 
indicated concentrations of rTGF-El were assayed by PAI/L (TGF- 
fi) assay (broken line) as measured by RLU on the right-hand Y- 
axis and MLEC (unbroken line) -assay as measured" by tritiated 
thymidine ( 3 H- thymidine) incorporation percent of controls 
described in Example 3E. The data points were normalized to 
DMEM-BSA. 

Figure 6 shows the effects of growth ; factors on DNA 
synthesis as measured by 3 H-thymidine incorporation percent of 
control. In the graph, DMEM-BSA containing rTGF-fil (closed 
squares), TGF-S2 (closed circles) , TGF-E3 (closed triangles), 
recombinant human bFGF (closed diamonds) , recombinant IL-lalpha 
(open squares), EGF (open circles), or recombinant PDGF-BB 
(open triangles) were separately assayed using the MLEC assay 
as described Example 3C. 

. Detailed Description of rhp Invention 
A. Definitions 

Recombinant DNA '.(rPNA) Majgails: A DNA molecule 
produced by operatively linking two DNA segments. Thus, a 
recombinant DNA molecule is a hybrid DNA molecule comprising . at 
least two nucleotide sequences not normally found together in 
nature. rDNA's not having a common biological origin, i.e., 
evolutionarily different, are said to be "heterologous". 

Vector : A rDNA molecule capable of autonomous replication 
in a cell and to. which a DNA segment, e.g., gene or 
polynucleotide, can be operatively linked so as to iring about 
replication of the attached segment. Vectors capable of 
directing the expression of genes encoding for one or more 
polypeptides are referred to herein as "expression vectors". 

UPStregm: In the direction opposite to the direction of 
DNA transcription, and therefore going from 5' to 3 ' on the 
non-coding strand, or 3 1 to 5' on the mRNA. 

Pownstr^rn: Further along a DNA sequence in the direction 
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of sequence transcription or read out, that is traveling in a 
3'- to 5' -direction along the non-coding strand of the DNA or 
5*- to 3 ' -direction along the RNA transcript. 

Reading Frame : Particular sequence of contiguous 
nucleotide triplets (codons) employed in translation that 5 
define the structural protein encoding -port ion of a gene, or 
structural gene. The reading frame depends on the location of 
the translation initiation codon. 

Response Element: Also referred to as .an enhancer 
element, is a short DNA sequence that occurs further upstream 10 
than the upstream promoter element. Response elements contain 
specific nucleotide sequences recognized by transcription 
factors that are DNA-binding proteins. 

Promoter: A region 6n a DNA molecule, generally from 100 
to 200 base pairs longs, upstream from the coding sequence; an 15 
area' to which the RNA polymerase initially binds prior to the 
initiation of trancription. The nucleotide sequence of the 
promoter, or at least part of it, determines the nature of the 
polymerase that associates with.it. Certain consensus 

sequences, CAT and TATA boxes, with, the promoter region are 20 
important for binding of RNA polymerase. 

.Regulatory Region: A DNA' control module upstream from the 
coding sequence containing an upstream promoter element and 
response elements, the iatter of which is also referred to as 
enhancer elements. 25 

Growth Factor: A small protein that binds to a receptor 
for controlling cell proliferation. 

Receptor : A molecule, such as a protein, glycoprotein and 
the like, that can specifically (non- randomly) bind to another 
molecule. Receptors of one type are plasma membrane proteins 30 
that bind specific molecules including growth .factors, 
hormones, or neurotransmitters, resulting in the transmission 
of a signal to the cell's interior causing the cell to respond 
in a specific manner. 

Sense Strand : A nucleotide sequence referred to as a 35 
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sense strand of a double- stranded deoxyribonucleic acid 
sequence is the nucleotide sequence that when read in the 5' to 
3 ' direction by the genetic code defines an amino acid sequence 
of interest. Alternatively, sense strand is. referred to as a 
coding strand. 

B- Trans forminn Krnwl-h Fart-or-R UJGEdL 

Transforming growth factor-S, hereinafter referred to 
as TGF-S, is a growth inhibitor that exhibits a diversity of 
biological activities in addition to its effects on cellular 
proliferation. TGF-S belongs to a large family of related 
molecules with a wide range of regulatory activities as 
described in the Background. For review, see Barnard et al., 
S i gchiTTl PiophYP , ftrffl , 1032:79-87 (1990), the disclosure of 
15 which is hereby incorporated by reference. 

As previously discussed, TGF-S is produced. and secreted 
from cells in three distinct molecular isoforms of TGF-S, the 
•genes of which are located on different chromosomes, have been 
identified in mammals and are designated TGF-E1 ,' TGF-S2 and 
20 TGF-S3. Derynck et al., Nature, 316:701-705 (1985); Hanks et 
al " PrPC Nfltl , ACrid , Sgj HSfi , 85:71-72 (1988); and Madisen 
et al., 7:1-8 (1988). Each of the isoforms are 

synthesized as high molecular weight latent or inactive 
precursor polypeptides that are then processed to 12.5 kD 
monomers that then dimerize to form biologically active, also 
referred to as mature, TGF-6. 

The activation process must occur to allow binding of the 
dimerized TGF-S to the high affinity TGF-S receptors expressed 
on the surfaces of all normal cells and most all neoplastic 
30 cells. Tucker et al . , Proc. Nar.1 , ftcad . Sri ti^ 81:6757- 
6761 (1984); Frolik et al., J. Binl rhom 259:10995-11000 
(1984).; Pircher et al., Bjochem. Bionhvs. . Commun 136:30- 
37 (1986), 

TGF-S has been shown to induce the increase secretion of 
35 the inhibitor, plasminogen activator inhibitor-1 (PAI-1) (Laiho 
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et al., J, BioJL Chfim , 262:17467-17474 (1987)). PAI-1 is the 
primary inhibitor of both tissue-type plasminogen activator (t- 
PA) and urokinase- type plasminogen activator (u-PA) , and as 
such is a potent anti-f ibrinolytic molecule. As a consequence 
of PAI-1 induction by .TGF-E, the activity of plasminogen 
activator (PA) is decreased. The resulting cascade of 
activation of plasminogen to plasmin is thereby inhibited 
resulting in the subsequent degradation of fibrin. 

While PAI-1 synthesis by TGF-S has -been shown to occur 
primarily at the level of transcription following the TGF-S 
. receptor- ligand interaction, the mechanism of activation of the 
PAI-1 promoter resulting in the transcription of the PAI-1 gene 
is less well understood. Studies of PAI-1 gene transcription 
have shown that the signal transduction mechanisms are 
independent of fle novo protein synthesis as determined by the 
lack of inhibition by cycloheximide and rapid onset of 
induction as described by Sawdey et al . , J. Biol . ch&m. . 
264:10396-10401 (1989), the disclosure of which is hereby 
incorporated by reference. The TGF-S-induced enhancement of 
promoter activity for the alpha 2 collagen gene has been shown 
• to be mediated by a binding site for nuclear factor- I as 
described by Sporn et al., J. c*ll Biol , . 105:1039-1045 (1987). 

As shown in Example 4, the PAI-1 promoter contains AP-1- 
like nucleotide sequences which is bound by the AP-1 
heterodimeric transcription factor complex of Fos and Jun 
protein subunits. Although AP-1 -like DNA enhancer sites are 
present in PAI-1, as shown in Example 4, activation of these 
sites by the AP-1 heterodimeric complex was independent of the 
TGF-£-mediated induction of PAI-1 synthesis. 

Although the exact transcriptional mechanism of PAI-1 
promoter activation following TGF-S receptor-ligand -interaction 
is not known as well as the identification of the responsible 
TGF-E-relatec transcription factor, the activation of a TGF-fc 
response element of this invention following TGF-S occupancy of 
the TGF-& receptor will be referred to as TGF-S- induced 
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activation. Since the TGF-E response element is activated by 
TGF-fc resulting in the induction of indicator protein 
expression, the TGF-£ response element is also referred to as a 
TGF-fc inducible response element 

5 

C. TGF-fc Response Elements 

The present invention is based on the discovery that 
when eucaryotic cells, transformed with a TGF-E-responsive 
expression vector of this invention, were exposed to liquid 

10 samples of TGF-&, the resulting expression of an indicator 

molecule was dose -dependent in relationship to the amount of 
TGF-S present in the sample. Thus, the present invention 
provides for a method to quantify the amount of TGF-6 in an 
liquid sample by measuring the amount of indicator molecules 

15 - expressed. 

The induced expression of the indicator molecules was the 
result of activation of TGF-fc response elements present in the 
, regulatory region of the TGF-fi responsive expression vectors, 
the latter of which are described in Section D. 

20 In practicing this invention, the regulation of 

transcription in the TGF-fc responsive expression vector- 
transformed eucaryotic cells is dependent TGF-&. As described 
above, the TGF-S occupation of the TGF-E receptor expressed on 
the surface of cells results in the activation of a TGF-S- 

25 related transcription factor. In general, transcription 

factors are site-specific DNA-binding proteins. Typically, 
usually positioned 5' to a structural gene is a region of . 
nucleotide sequences that are responsible for controlling 
transcription. This region has been coined, the "control 

30 module" . 

The control module comprises two categories of regulatory 
sequences, the promoter element and the enhancer elements. The 
promoter is referred to as an upstream promoter as it lies 
upstream of the structural genes. Promoter elements are 
35 usually 100 to 200 base pairs long and the segment of DNA is 
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activation- of a TGF-S response element refers to a 
process whereby the functional state of the TGF-6 response 
element is altered. The result of the TGF-6 activation of the 
TGF-S response element is an increase in the transcriptional 
efficiency of the structural gene driven from the promoter.. 

A further embodiment of a TGF-S response element is that 
it is inducible. The term -inducible" refers to a an • 
enhancement of a particular function, in this invention, the - 
functional activity .of a TGF-S .response element is increased or 
induced following activation by the TGF-S-related transcription 
factor. Thus, the TGF-S response element is also referred to 
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relatively close to the site of initiation of transcription. ■ A 
particular sequence recognized by one of several transcription 
factors that are known to bind to the promoter region is the 
TATA box, a region that is rich in A-T base pairs. 
5 The enhancer regions are also referred to as" response 

regions or response elements. Thus the term "TGF-S response 
element" can also be designated " TGF-S enhancer", "TGF-S 
enhancer region", or "TGF-S response region", and the like 
■ The enhancer region is hereinafter referred to as a response 
10 element. They are short DNA segments that occur further 

upstream from the initiator site than the upstream promoter 
element. Response elements contain specific sequences that are 
recognized by transcription factors. The response elements are 
often a few 1000 base pairs 5' to the promoter but may even be 
15 20,000 base pairs or more distant. 

The binding of a transcription factor to either a 
nucleotide sequence comprising a response element or promoter 
resembles an "on switch", m the context of the present 
invention, the binding of the TGF-S-related transcription 
factor results in the dose-dependent activation of the promoter 
resulting m the transcription of a structural region gene from 
DNA into RNA. In most cases, the resulting RNA molecule serves ' 
as a template for synthesis of a specific molecule, such as the 
indicator molecule of this invention. * 

l $ Thus, "activation" of * tc-f.r _-. t 
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as a TGF-S inducible response element. 

The result of TGF-& response element activation is the 
coordinate transcription and translation of the structural 
region containing a gene encoding an indicator protein of this 
5 invention as described in Section D. The resulting expression 
of an indicator molecule is dose-dependent in relationship to 
the amount of TGF-S present in the sample. 

The term "dose-dependent" refers to the functional 
relationship between the amount of TGF-fc activating the TGF-S 

10 response element and the resulting expression of the indicator 
molecule. Thus, the functional relationship between TGF-fc 
activation and expression of an indicator molecule can be 
referred to as a linear relationship. Because of the dose- 
dependent expression of an indicator molecule, such as 

15. . lucif erase, in response to TGF-5 exposure, the amount of TGF-£ 
responsible for the activation of the expression can be readily 
determined using the methods of this invention. 

Thus, based on the teachings herein, a TGF-& response 
element nucleotide sequence is characterized by its ability to 

20 be responsive to TGF-S-induced activation. Such a TGF-S 
response element is useful herein as a component in the 
expression vectors of this invention to provide for the ability 
to quantify the amount of TGF-fc responsible for the 
transcriptional activation. Thus, a TGF-S response element of 

25 this invention comprises any nucleotide sequence that is 

activated by TGF-&, the process of which is as described in 
Section B. 

In the context of this invention, the term nucleotide 
sequence refers to a plurality of joined nucleotide units 
30 formed from naturally- or non-naturally occurring bases and 
cyclofuranosyl groups joined by phosphodiester bonds. Thus, 
the nucleotide sequence includes the use of nucleotide analogs. 

One embodiment of a TGF-S response element of this 
invention is an isolated double- stranded deoxyribonucleic acid 
35 molecule comprising a sequence of nucleotide bases that defines 
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a TGF-fi. response element. However, neither is it necessary 
that the obtained TGF-& be a naturally occurring sequence 
present in the other genes nor that the TGF-& response element 
be limited to deoxyribonucleo tides . The TGF-E response element 
5 may be found in DNA or RNA, in regulatory sequences, exons, or 
introns . 

■ Preferred TGF-fi. response elements are derived from 
selected regions of the promoter regions of the plasminogen 
activator inhibitor type 1 gene, hereinafter referred to as 
10 PAI-1, as described by Loskutoff et al., Biochgm. . 26:3763-3768 
(1987), the disclosure of which is hereby incorporated by 
reference. Loskutoff et al . describes a cosmid containing the 
entire PAI-1 gene. In a related study, the glucocorticoid 
regulation of the PAI-1 promoter was described by van Zonneveld 
15 et a1 -' PrPC, Hall, ftCflrt, Pn ,, 85:5525-5529 (1988), the 

disclosure of which is hereby incorporated by reference. The 
sequence of the PAI-1 promoter corresponding to nucleotide 
positions -800 and extending through the TATA box and 
initiation site and ending at nucleotide position +200, the 
latter of which corresponds to the PAI-1 encoded protein at the 
ninth amino acid residue, -in available in the GenBank™/EMBL 
Data Bank with Accession Number J03836. 

Moreover, Bosma.et al., J, Biol Ph^m 263:9129-9141 
(1986), have described the entire 15,867 bp PAI-1 gene sequence 
including significant - stretches of DNA that extend into its 5'- 
and. 3 '-flanking DNA regions, the nucleotide sequence of which 
is available in the GenBank™/EMBL Data Bank with. Accession 
Number J03764. 

The PAI-1 promoter-derived TGF-S response elements for use 
in this invention are identified by the nucleotide positions 
corresponding to the region in the PAI-1 promoter as listed in 
the GenBank™/EM3L Data Bank Accession Number J03836. 

Exemplary TGF-£ response elements derived from the PAI-1 
promoter have the nucleotide sequences listed in the Sequence 
35 Listing inSEQ ID NOs 11-17. The nucleotide sequences are 
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listed showing only the sense strand in the 5* to 3* direction 
of a double-stranded isolated TGF-S response element nucleotide 
sequence. The PAI-l-derived TGF-S response elements 
corresponding to SEQ ID NOs 11-17 have the respective 
designations with the nucleotide regions corresponding to the 
PAI-1 promoter indicated in parentheses: 1) SEQ ID NO 11 .= 
1500 (-1481 to -40); 2) SEQ ID NO 12 = 800 (-800 up to -40); 3) 
SEQ. ID NO 13 = 800/636 (-800 up to -636); 4) SEQ ID NO 14 = 56 
(-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to -650); 6) SEQ ID 
NO 16 = 743 (-743 to -708); and 7) SEQ ID NO 17 = 732 (-732 to 
-708). 

In one embodiment, a TGF-S response element useful for 
practicing the present invention may be derived from any 
promoter nucleotide sequence. In a further embodiment, a TGF-S 
response element may be designed to contain preselected 
nucleotide bases. In other words, a subject TGF-S response 
element need not be identical to the nucleotide sequence of the 
PAI-l-derived TGF-S response elements described herein, so long 
as the nucleotide sequence is activatable by TGF-S. 

A TGF-S response element of this invention thus may 
contain a variety of nucleotide units of any length, typically 
from about 5 to about 2000 nucleotides in length. More 
preferably, a TGF-S response element comprises nucleotide units 
from about 15 to about 1500 nucleotides in length. 

A preferred embodiment is a TGF-S response element having 
nucleotide sequences that is greater than 50 base pairs in 
length. Exemplary long TGF-S response elements derived from 
PAI-1 are listing in the Sequence Listing in SEQ ID NOs 11-13. 

A preferred embodiment is a TGF-S response element having 
nucleotide sequences that is less than 50 base pairs in length. 
Exemplary short TGF-S response elements derived from PAI-1 are 
listing in the Sequence Listing in SEQ ID NOs 14-17. 

In one embodiment, the invention contemplates the presence 
of at least one TGF-S response element present in the 
regulatory region of the expression vectors as described in 
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Section D. Thus, one or more stretches of a nucleotide 
sequence comprising a TGF-E response element may be present 
within a regulatory region. If more than one TGF-E response 
element is present, they are not required to be identical. In 
5 other words, TGF-E response elements having different 

nucleotide sequences as well as diff erent . lengths can be 
combined in a regulatory region of an expression vector of this 
invention. 

TGF-S response elements can be derived or produced from 

10 the PAI-1 promoter by truncation or expansion of the native or 
wild-type PAI-1 promoter nucleotide sequence or as a variant of 
the native PAI-1 promoter by site-directed substitution of a 
preselected nucleotide base or bases. 

Also contemplated, in this context are regulatory regions 

15 containing multiple TGF-S response elements that can be either 
longer, shorter, tandemly arranged, reversed in orientation, 
and permutations thereof. The design and construction of such 
arrangements are well known to one of ordinary skill in the art 
of oligonucleotide design and synthesis and are described by 

20 Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold 
Spring Laboratory, pp 390-401 (1982). 

It is also contemplated that nucleotide .base modifications 
can be made resulting in nucleotide analogs to provide certain 
advantages to the TGF-& response elements of this invention. 

25 A nucleotide analog, refers to moieties that function 

similarly to nucleotide sequences in a TGF-S response element 
of this invention but which have non-naturally occurring 
portions. Thus, nucleotide analogs can have altered sugar 
.moieties or inter-sugar, linkages. Exemplary are the 

30 phosphorothioate and other sulfur-containing species, analogs 
having altered base units, or other modifications consistent 
with the spirit of this invention. 

Preferred modifications include, but are not limited to, 
the ethyl or methyl phosphonate modifications disclosed in the 

35 U.S. Patent No., 4,469,863 and the phosphorothioate modified 
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deoxyribonucleotides described by LaPlanche et al., Nucl ■ AC ids 
Res . , 14:9081 (1986) and Stec et al . , J, ftpi, Chffm, SCC , , 
106:6077 (1984), the disclosures of which are hereby 
incorporated by reference. These modifications provide 
5 resistance to nucleolytic degradation . Preferred modifications 
are the modifications of the 3 1 -terminus using phosphothionate 
(PS) sulfurization modification described by Stein et al., 
Nucl. Acids Res. , 16:3209 (1988). 

TGF-E response elements comprising nucleotide sequences 

10 * can be obtained by a variety of procedures well known in the 
art, including de novo chemical' synthesis of complementary 
oligonucleotides and derivation of nucleic acid fragments from 
native nucleic acid sequences existing as genes , or parts of 
genes, in a genome, plasmid, or other vector, such as by 

15 restriction endonuclease digestion of larger nucleic acid 

fragments and strand separation or by enzymatic synthesis using 
a nucleic acid template. 

De novo chemical synthesis of oligonucleotides can be 
carried out, for example, by the phosphotri ester method 

20 described by Matteucci et al., J - Am. Chem. Soc, 103:3185 
(1981), or as described in U.S. Patent No. 4,356,270, the 
disclosures of which are hereby incorporated by reference. A 
particularly preferred method is the phosphoramide method using 
commercial automated synthesizers, such as the ABI automated 

25 synthesizer by Applied Biosystems. Inc., (Foster City, CA) . : 
Oligonucleotides can be purified after synthesis using 
published procedures as described by Miller et al . J , Piol » 
Chem. , 255:9659 (1980). Thereafter, complementary * 
oligonucleotides are hybridized to form double-stranded DNA 

30 segments that are TGF-S response elements. Particularly 

■ preferred chemically-synthesized oligonucleotides _ are described 
in Example 1C and the sense strands of which are listed in SEQ 
ID NOs 14-17, as described above. 

Derivation of a TGF-fi response element from nucleic acids 

35 involves the cloning of a nucleic acid into an appropriate host 
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by means of a cloning vector, replication of the vector and 
therefore multiplication of the amount of the cloned nucleic 
acid followed by isolation of subf ragments of the cloned 
nucleic acids. For a description of subcloning -nucleic acid 
fragments, see Sambrook et al., Molecular Cloning: A 
Laboratory Manual, Cold Spring Laboratory, pp 390-401 (1982); 
and see U.S. Patent Nos 4,416,988 and 4,403,036. 

In one embodiment, TGF-S response elements are obtained by 
restriction digestion of cloned vectors containing the PAI-1 
promoter as described in Example 1A and 1C. Particularly 
preferred nucleotide sequences containing TGF-G response 
elements as well as the minimal promoter sequence obtained in 
this manner include nucleotide sequences corresponding to the 
nucleotide positions in the PAI-1 promoter sequence from -1481 
to +76, specifically a Kpn I/Eco RI digest and -800 to +76, . 
specifically a Hind III/Eco RI digest. 

In an additional embodiment, in the practice of this 
.invention, it is not necessary that the TGF-E response element 
nucleotide sequence be known in order to obtain a TGF-S 
response element capable of being activated by TGF-fi. To that 
end, contemplated for use in this invention are TGF-fi response 
elements obtained from promoter regions of other genes that can 
be determined to contain TGF-S response elements using the 
methods of this invention. 

D. - TGF-S Responsive Plasmid Ex pression Vert-or* 

The present invention contemplates TGF-fc responsive 
plasmid expression vectors in substantially pure form capable 
of causing expression of an indicator molecule in a eucaryotic 
cell. The term "TGF-fi responsive" identifies an expression 
vector of this invention that by its composition contains TGF-E 
response elements that are activated by TGF-S mediated through 
a TGF-E response element specific transcription factor as 
described in Section C. Vectors capable of directing the 
expression of genes to which they are operatively linked are 
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referred to herein as "expression vectors". 

As used herein, the term "vector" refers to a nucleic acid 
molecule capable of transporting between different genetic 
environments another nucleic acid to which it has been 
operatively linked, One type of preferred vector is an 
episome, i.e., a nucleic acid capable of extra -chromosomal 
replication. Preferred vectors are those capable of autonomous 
-replication and/or expression of nucleic acids to which they 
are linked. 

A TGF-E expression vector of this invention is a circular 
double-stranded plasmid that contains at least the following 
elements: 1) a regulatory region having at least one TGF-fc 
response element as defined in Section C, where the regulatory- 
region is operatively linked to a promoter; and 2) a structural 
region downstream of the promoter that contains a gene coding 
for an indicator molecule of this invention. 

In a separate embodiment, a TGF-S expression vector also 
contains a gene, the expression of which confers a selective 
advantage, such as a drug resistance, to the eucaryotic host 
cell when introduced or transformed into those cells. A 
typical eucaryotic drug resistance genes confers resistance to 
neomycin, also referred to as G418 or Geneticin. 

The choice of vector to which the regulatory region, 
promoter, and structural region of the. present invention is 
operatively linked depends directly, as is well known in the 
art, on the functional properties desired, e.g., replication or 
protein expression, and the host cell to be transformed, these 
being limitations inherit in the art of constructing 
recombinant DNA molecules. 

In preferred embodiments, the vector utilized includes 
procaryotic sequences that facilitate the propagation of the 
vector in bacteria, i.e., a DNA sequence having the ability to 
direct autonomous replication and maintenance of the 
recombinant DNA molecule extra -chromosomally when introduced 
into a bacterial host cell. Such replicons are well known in 
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the art. In addition, the TGF-fc expression vector of this p8 

invention includes one or more transcription units that are p 5 , 

expressed only in eucaryotic cells. p7 ' 

The eucaryotic transcription unit consists -of noncoding (g 

5 sequences and sequences encoding selectable markers. The 5 ? 

expression vectors of this invention also contain distinct : 

sequence elements that are required for accurate and efficient ar ;. 

polyadenylation, referred to as PAl, 2 and 3 and as shown in de ; 

Figure 1, In addition, splicing signals for generating mature S e j 

10 mRNA are included in the vector. The eucaryotic TGF-S 10 ve ] 
responsive expression vectors contain viral replicons, the 

presence of which provides for the increase in the level of of jj 

expression of cloned genes. A preferred replication sequence ar :• 

is provided by the simian virus 40 or SV40 papovavirus. in ; 

15 Operatively linking refers to the covalent joining of 15 . de 

nucleotide sequences, preferably by conventional phosphodiester de ■; 

bonds, into one strand of DNA, whether in single- or double- cc 

•stranded form. Moreover, the joining of nucleotide sequences re 

results in the joining of functional elements such as response Th 

20 elements in regulatory regions with promoters and downstream 20 or ; 

structural regions as described herein. pr 

A' preferred eucaryotic expression vector of this invention pr 

as prepared in Example 1 contains a regulatory region having TG 

TGF-E response, elements derived, from the 5* promoter end of the qi. , 

25 human plasminogen activator inhibitor type 1 (PAI-1) gene 25 (: 

operatively linked to PAI-1 minimal promoter and a downstream m€ i 

structural region containing a gene coding for an indicator d€ I; 

polypeptide, preferably lucif erase. !" 

Exemplary TGF-S responsive expression vectors include the v« ~ 

30 following expression vectors, the designations of which are 30 C\ ; 

'indicated along with the corresponding SEQ ID NO in which the Yc < 

sense strand of the expression vector is listed where the first L< J- 
nucleotide of the double- stranded circular vector is the middle 

-r nucleotide, present in the Eco Fvl restriction site as \ 

35 described in Example 1: 1) p800neoLuc {SEQ ID NO 1) ; 2) 35 * 
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P 800/636neoLuc (SEQ ID NO 2); 3) p56neoLuc (SEQ ID NO 3); 4) 
P 674neoLuc (SEQ ID NO 4); 5) p743heoLuc < SEQ ID NO 5); 6) 
P 732neoLuc (SEQ ID NO 6); 7) p56Luc (SEQ ID NO 7); 8) p674Luc 
(SEQ ID NO 8); 9) p743Luc (SEQ ID NO 9); and 10) p732Luc (SEQ 

5 ID NO 10) . 

The exemplary TGF-S expression vectors of this invention 
are derived from the starting cloning expression vector, 
designated pl9Luc, as described in Example 1. The nucleotide 
sequence. of the sense strand of an Eco Rl-linearized pl9LUC 

10 vector is listed in the Sequence Listing as SEQ ID NO 21. 

A further embodiment of this invention is the preparation 
of TGF-E responsive expression vectors having altered 
arrangements of and selected types of TGF-fi response elements 
in the regulatory region. To that end, pl9Luc and the pl9Luc- 

15 . derived p39Luc expression cloning vectors, both of which is 
described in Example 1, are vectors that allow for the 
construction of TGF-S responsive vectors having any selected 
regulatory region operatively ligated to a selected promoter. 
Therefore, any regulatory region of any length containing one 

20 or more TGF-S response elements can be paired with any 

promoter, a non-TGF-E responsive PAI-1 or heterologous HBV 
promoter as used herein but not limited to that, to prepare 
TGF-& responsive expression vectors that provide for the 
quantitation of inducing TGF-S. 

25 In a related embodiment, in addition to the construction 

• methods detailed herein, other methods of preparing pl9Luc- 
derived expression vectors having TGF-E response elements and 
promoters are familiar to one of ordinary skill in the art of 
vector construction and are described by Ausebel, et al.. In 

30 Current Protocols in Molecular Biology, Wiley and Sons, New 

• York (1993) and by Sambrook et al., Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor Laboratory, 1989. 
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preferred embodiment is a TGF-fi responsive expression vector 
having a gene for encoding a selectable marker providing for 
stably transformed cells. Stably transformed cells confer the 
ability to utilize a reproducible source for practicing the 
methods of this invention over a course of time. A preferred 

selectable marker gene is the gene conferring neomycin- " ^ | 

resistance. Such a gene for encoding the selectable marker was a l 
derived from an expression vector, designated pMAMneo, as 
described in Example 1. The nucleotide sequence of the 
neomycin -resistance conferring gene is listed in SEQ ID NO 20. 

In one embodiment, a TGF-S responsive expression vector ha ! 

contains a first nucleotide sequence comprising a regulatory el \ 

region that includes at least one TGF-S inducible, response f 
element operatively linked to a promoter, a second nucleotide 
sequence comprising a structural region downstream of the 15 

promoter and coding for an indicator molecule, and a third ve ; 

nucleotide sequence comprising a gene encoding a selectable el : 

marker for the selection of a stably transformed cell, where As f 

the response element is capable of inducing dose-dependent 

luciferase activity and the structural region codes for 

lucif erase . 

Preferred expression vectors containing the neomycin- 
resistance conferring gene include the following designations 
followed in parenthesis by the corresponding SEQ ID NO in which 

.the sense strand of each Eco Rl-linearized vector is listed 25 Th 

according to the convention adopted in this. invention for to 
listing vector sequences: 1) p800neoLuc (SEQ ID NO 1); 2) an 

p800/636neoLuc (SEQ ID NO 2); 3) P 56neoLuc (SEQ ID NO 3); 4) CI 

p674neoLuc (SEQ ID NO 4); 5) p743neoLuc (SEQ ID NO 5); 6) 

p732neoLuc (SEQ ID NO 6) . 

In a further embodiment, the plasmid expression vectors of 

this invention contain TGF-S inducible response elements that th f 

correspond to a nucleotide sequence listed in SEQ ID NOs 11-17 th ' : 

as described in Section C. 
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vectors of this invention for stably transforming cells as we! 1 
as for transient transformation are the PAI-1 minimal promoter 
sequence and the hepatitis B virus minimal promoter sequence, 
the sense sequences of which are respectively liited in SEQ id 
NOsl8andl9. Contemplated for use in this invention are 
promoters that are not responsive to TGF-S. The selection of 
alternative promoters is within the scope of one having 
ordinary skill in the art. 

This invention contemplates additional TGF-S expression 
vectors for stably transforming cells that can be designed to 
have regulatory regions that contain alternative TGF-S response 
elements and promoters. 

a - Bsgulafcflrv, R«=>rM Pr | 

The regulatory region of a TGF-S expression 
vector of this invention contains at least one TGF-S response 
element as described herein and in Section C of this invention 
As contemplated for use in this invention, the regulatory 

T g n° f 3 TCF ' S £XpreSSion vector "n range in length from 5 
to 2000 base pairs, preferably 15 to 1500 base pairs, and can 
contain more than one TGF-S response element in any orientation 
and arrangement. Thus, if two or more TGF-S response elements 
are present in a regulatory region, .they may be contiguous with 
one another or separated by an. intervening nucleotide sequence 
The design and construction of such arrangements are well known 
to one of ordinary skill in the art of oligonucleotide design 
and synthesis and are described by Sambrook et al . , Molecular 
Cloning: A Laboratory Manual, Cold Spring Laboratory, pp 390- 
401 (1982) . vv 

. Preferred TGF-S response elements present in the 
regulatory region of a TGF-S expression vector are derived from 
the PAI-l promoter and have the nucleotide sequences listed in 
the Sequence Listing in SEQ ID NOs 11-17. The PAI-l-derived 
TGF-S response elements corresponding to SSQ id NOs 11-17 have 
the respective designations with the nucleotide regions 
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corresponding to the PAI-1 promoter indicated in parentheses: 
1) SEQ ID NO 11 = 1500 (-1481 to -40); 2) SEQ ID NO 12 = 800 (- 
800 up to -40); 3) SEQ ID NO 13 = 800/636 (-800 up to -636); 4) 
SEQ ID NO 14 = 56 (-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to 
- -650); 6) SEQ ID NO 16 = 743 (-743 to -708);' and 7) SEQ ID NO 
17 = 732 (-732 to -708) . 

b. Structural Rpmnp 

A plasmid vector of the present invention " 
contain a structural region having a nucleotide sequence that 
encodes an indicator molecule. The structural region is 
operatively linked to the regulatory region such that the 
inducible promoter of the regulatory region, under the 
inducible control of the TGF-& response element, controls 
15 transcription and expression of the indicator molecule. Thus, 
upon induction of the TGF-& response element, the regulatory 
region transcribes and thereby expresses the indicator molecule 
resulting in a detectable event in the cell, which event can be 
measured by detection of the amount of the expressed indicator 
20 molecule. In other words, the response element is capable of 
inducing. the expression of the indicator molecule by virtue of 
it's controlling expression of the indicator through the 
promoter to which the response element is operatively linked. 
Typically, the structural region is "downstream" of the 
25 regulatory region in the plasmid, and positioned to be under- 
the direct control of the regulatory region. Other 
configurations can be utilized so long as the induction of the 
TGF-fi response element results in the expression of the 
indicator polypeptide. Exemplary and preferred configurations 
30 are described in Examples. 

The term "indicator molecule" as used in this, invention 
refers to a molecule encoded by a reporter gene, the expression 
of which in the expression vectors of this invention, results 
in a detectable measurable protein, polypeptide, enzyme and the 
35 like. Alternative expressions for indicator molecule are 
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reporter molecule, reporter polypeptide, indicator protein, 
indicator polypeptide and the like. In preferred embodiments, 
the indicator molecule is a protein. 

There are any of a variety of indicator .polypeptides 
5 suitable for use in the present invention, and the invention 
need not be so limited to any particular indicator. A 
preferred indicator polypeptide is luciferase encoded by the 
firefly lucif erase gene. Use of the luciferase gene for 
expression of luciferase has been described by Gould et al., 

10 ftnal. Biochem .. 7:5-13 (1988) and Brasier et al . , Pio- 

Technicrues . 7:1116-1122 (1989). A preferred structural region 
includes a nucleotide sequence having the sequence 
characteristics of the luciferase gene shown in SEQ ID NO 21. 
Alternative embodiments include indicator proteins such a 

15 • S-galactosidase and chloramphenicol acetyltransf erase (CAT) . 
Use of a £-galactosidase and CAT as reporter molecules have 
been respectively by Luskin et al., Neuron , 1:635-647 (1988) 
" and Gorman et al . , MoT , QgU Biol. . 2:1044-1051 (1982). 

Associated with the use of an indicator molecule in the 

20 quantifying TGF-fi are means for measuring the indicator 

molecule. A preferred method for detecting the luciferase 
indicator molecule is the use of a luminometer commercially 
available from Dynatech Laboratories Inc., Chantilly, VA as 
described in Example 3A and analyzed according to 

25 manufacturer's instructions. For detecting CAT activity, a 

simple-phase extraction assay has been developed and described 
by Seed et al., Gene . 67:271-277 (1988), the disclosure of 
which is hereby incorporated by reference. Alternative 
preferred methods for detecting CAT activity are described in 

30 Current Protocols in Molecular Biology, Eds, Ausebel et al . , 
Unit 9.0, John Wiley & Sons (1993). Expression of £- 
galactosidase activity is performed in activity assays 
performed essentially as described by Miller, Experiments in 
Molecular Genetics, Cold Spring Harbor Laboratory, New York, 

35 (1972), the disclosure of which is hereby incorporated by 
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reference. With S-galactosidase additional reagents are 
required to visualize its presence following induced 
expression. Such additional reagents for 6-galactosidase 
include o-nitrophenyl-£-D-galactopyransoside and the like for 
5 the development of a color reaction by. absorbance at 
wavelengths of 500 and 420. 

c. Selectable Marker Gene 

In preferred embodiments, the plasmid 

10 vector of the present invention includes a gene that encodes a 
selectable marker that is effective in a eucaryotic cell, 
preferably a drug resistance selection marker. A preferred 
drug resistance selection marker is a gene whose expression 
results in neomycin resistance, i.e., the neomycin 

15 phosphotransferase (neo) gene [Southern et al . , J, Mol . hml\ 
Genet . , 1:327-341 (1982)] or a gene whose expression results 
kanamycin resistance, i.e., the chimeric gene containing 
nopaline synthetase promoter, Tn5 neomycin phosphotransferase 
II and nopaline synthetase 3 1 non-translated region described 

20 by Rogers et al., MPt-hori*. for Plant Molecular Biology, A, 

Weissbach and H. Weissbach, eds., Academic Press, Inc., San 
Diego, CA (1988) . Other selectable markers which are 
utilizable in eucaryotic cells can be utilized in the present 
vectors and methods and therefore the invention need not be 

25 limited to any particular selectable marker. Thus, the 

invention contemplates the use of a nucleotide sequence which 
confers a eucaryotic selection means, including but not limited 
to genes for resistance to neomycin and kanamycin. 

A preferred nucleotide sequence defining a selectable 

30 marker gene is a nucleotide sequence having the sequence 

characteristics of the neomycin resistance gene shown in SEQ ID 
NO. 20. 

The use of a selectable marker for eucaryotic cells 
provides the advantage of producing stably transformed cells, 
35 as discussed herein. Thus, one can produce a eucaryotic cell 
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line containing a plasmid vector of this invention for use in 
the present methods wherein all the cells of the culture are 
selected to be uniform and each contain intact plasmid vector, 
thereby assuring that -all of the eucaryotic cell "in the culture 
are substantially similar in responsiveness to TGF-£, thereby 
increasing the reliability and sensitivity of the assay. 

In addition, preferred embodiments that include a 
procaryotic replicon also include a gene whose expression 
confers a selective advantage, such as a drug resistance, to 
the bacterial host cell when introduced into. those transformed 
cells. Typical bacterial drug resistance genes are those that 
confer resistance to ampicillin or tetracycline. 

Those vectors that include a procaryotic replicon also 
typically include convenient restriction sites for insertion of 
15 a recombinant DNA molecule of the present invention. Typical 
of such vector plasmids are pUC8, pUC9, pBR322, and pBR32 9 
available from BioRad Laboratories, (Richmond, CA) and pPL, pK 
and K223 available from Pharmacia, (Piscataway, NJ) , and 
pBLUESCRIPT and pBS available from Stratagene, (La Jolla, CA) 
A vector of the present invention may also be a Lambda phage 
vector including those Lambda vectors described in Molecihl^r 
C l on i ng; A Laboratory Manual, Second Edition, Maniatis et al . , 
eds., Cold Spring Harbor, NY (1989). 

Plasmid vectors for use in the present invention are also 
compatible with eukaryotic cells.. Eucaryotic cell expression 
vectors are well known in the art and are available from 
several commercial sources. Typically, such vectors provide 
convenient restriction sites. for insertion of the desired 
recombinant DNA molecule, and further contain promoters for 
expression of the encoded genes which are capable of expression 
in the eucaryotic .cell, as discussed earlier. -Typical of such 
vectors are pSVO and pKSV-10 (Pharmacia), and pPW-l/PML2d 
(International Biotechnology, Inc.), and pTDTl (ATCC, No. 
31255) . 
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2 . Pia^mid yprmrs f or "Co- tran?f pap as ion and 

Transien t Transformation 

This invention contemplates the use of TGF-& 
responsive expression vectors having regulatory, promoter and 
5 structural regions but lacking a gene for encoding a selectable 
marker. In other words, in practicing this invention, TGF-fc 
expression vectors for transient transformation of eucaryotic 
cells are contemplated. This embodiment allows for an 
alternative to stable transformation of cells for use 
10 practicing the methods of this invention. Transiently 

transformed cells produced as described in Example 2D. are 
useful for performing TGF-S assays when having stably 
. transformed cells is not required or necessitated. As 
described in Example 4, transiently transformed cells are 
15 useful for determining the nucleotide sequence of TGF-fc 

response elements as well as quantifying the amount of TGF-fc 
present in a heterogeneous or homogeneous liquid sample. 

Preferred TGF-fi expression vectors used for transiently 
transforming eucaryotic cells include the following vectors 
20 shown with their designations and SEQ ID NOs in which the sense 
strand of the double- stranded Eco Rl-linearized vectors is 
listed: 1) p56Luc (SEQ ID NO 7); 2) p674Luc (SEQ ID NO. 8); 3) 
p743Luc (SEQ ID NO 9); and 4) p732Luc (SEQ ID NO 10). 

The invention further describes TGF-G responsive plasmids 
25 lacking a selectable- marker gene having the identifying 

characteristics of plasmids that have been deposited with the 
American Type Culture Collection, Rockville, MD having the 
assigned ATCC Accession Numbers 75627, 75628, 75629., the 
plasmids of which respectively correspond to the Eco RI- 
30 linearized sense strand nucleotide sequences listed SEQ ID NOs 
8-10. 

In an additional embodiment, this invention describes the 
co-transformation. of TGF-S expression vectors for transient 
transformation in conjunction with a second expression vector 
35 from which a selectable marker is expressed. A preferred 
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selectable marker expressing plasmid is RSVneo as described in 
Example 2C. The ability to prepare stably transformed ceils 
through the use of a vector that only confers transient 
transformation is accomplished with this approach. The 
5 advantage this approach provides is that further vector 

constructions for inserting selectable marker genes can be 
avoided, thereby providing stably transformed cells for use in 
practicing this invention when necessitated. Thus, eucaryotic 
cells that have been co-transformed with a transient TGF-fc 

10 expression vector and a second plasmid such as RSVneo provide 
for an alternative approach .to create stably transformed 
eucaryotic cells. 

Any transient TGF-fi expression vector of this invention 
can.be used in this context. A preferred co- trans formed 

15 eucaryotic cell is the cell line Hep3B that has been co- 
transformed with RSVneo and the plSOOLuc expression vector 
having the TGF-E response element in SEQ ID NO 11. This stably 
'transformed cell line has been deposited with the American Type 
Culture Collection, Rockville, MD and has been assigned ATCC 

20 having ATCC Accession Number CRL 11508. 

With the teachings of this invention, additional TGF-fc 
expression vectors for transiently transforming cells can be 
designed to have regulatory regions that contain alternative 
TGF-fc response elements, and promoters. In a further 

25 embodiment, these additional vectors can be used to prepare 
stably transformed cells through the use of the co- 
transformation approach. 

3. Rprinient Cells for Transformations 
30 Insofar as the invention describes plasmid 

vectors for use in the present invention, the invention also 
contemplates a eucaryotic cell containing a plasmid vector of 
the present invention. 

A eucaryotic cell suitable for use can be any eucaryocic 
35 cell which expresses a TGF-fc receptor on its cell surface and 
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is capable of induction of a TGF-fc response element. There are 
a variety of means to identify a suitable eucaryotic cell, 
including, but not limited to transformation by a plasmid 
vector of this invention, followed by assay for expression of 
5 the indicator polypeptide upon challenge by* TGF-&. ^ 
In a preferred embodiment, this invention contemplates the 
use of mammalian cells. Preferred mammalian cells include mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 
Hep3B cells, GM7373 cells, NIH 3T3 cells, and the like cells. 

10 These and other suitable mammalian cells are widely available. 1Q 
Suitable mammalian cells for use in the invention can also be 
obtained from the American Type Culture Collection (ATCC; 
Rockville, MD) . 

. Introduction* of a plasmid vector of the present invention 

15 into a eucaryotic cell can be accomplished by a variety of 15 
methods well known in the art, including, but not limited to 
trans feet ion, transformation, electroporation, microinjection, 
liposome fusion, and the like introduction methods. Such 
methods are well known and are not to be considered essential 

20' to the invention. Furthermore, the introduction of the plasmid 
vector can be transient or stable. 

A transient introduction is one where there is no 
selection to maintain the plasmid vector within the host 
eucaryotic cell through multiple rounds of cell division. 

25 Therefore, the assay is to be conducted in a short time period 
after introduction, and before several rounds of cell division. 
Stable, introduction of plasmid involves the culturing of the 
cell under conditions that select for the maintenance of the 
plasmid vector, typically by the use of a gene on the plasmid 

30 that encodes a selectable marker, as described further herein. ^0 
Following the introduction of the plasmid vector, the 
resulting eucaryotic cell containing a plasmid vector is used 
in the assay methods described herein. A preferred eucaryotic 
cell contains a plasmid vector of this invention, which plasmid 

35 vector comprises a nucleotide sequence having a TGF-fi response 35 
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element and a gene encoding an indicator polypeptide, wherein 
the plasmid is capable of expression of the indicator 
polypeptide in response to TGF-& induction. Particularly 
preferred are eucaryotic cells that contain a plasmid vector 
having a nucleotide sequence with the nucleotide sequence 
characteristics of the TGF-fc response element selected from the 
group consisting of the sequences shown in SEQ ID NOs 11-17. k 
particularly. preferred eucaryotic cell contains a plasmid 
vector having a nucleotide sequence with the nucleotide 
sequence characteristics of the plasmid vector selected from 
the group consisting of the sequences shown in SEQ ID NOs 1-10. 

A preferred eucaryotic cell described further herein is 
Hep3B stably transformed with the plasmid vector plSOOLuc, 
referred to as LUCI, and having the ATCC accession No. CRL 
11508. 

E. " Methods for Quantifying TCF-ft 

The present invention describes methods for detecting 
the presence, and preferably quantifying the amount, of TGF-E 
in a liquid sample, either containing purified TGF-& or TGF-fi 
in a heterogeneous admixture, and is also referred to herein as 
a TGF-& assay. The assay system provides for the 
quantification of TGF-E through the expression of an indicator 
polypeptide which is expressed in levels proportional to the 
amount of TGF-fc being detected. 

The assay is a highly sensitive and specific, non- 
radioactive assay, for detecting mature (active) TGF-S. When 
compared to the sensitive and widely used proliferation-based 
mink lung epithelial cell (MLE cells) method for measuring TGF- 
£ concentration, the TGF-S assay method of this invention is 
more rapid, has comparable sensitivity, and has a greater 
detection range. Specificity of this novel assay was also 
higher as evidenced by its relative insensitivity to factors 
such as epidermal growth factor (EGF) and basic fibroblast 
growth factor (bFGF) which can greatly affect other assays. 
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The use of a TGF-S response element, such as the truncated PAI- 
1 promoter, that does not respond to other growth modulators 
such as platelet -derived growth factor (PDGF) found in 
biological samples provides an added advantage-that the- method 
5 of this invention can be used in conditions where other 5 j 

bioassays are difficult to interpret. Because of its large | 
range and specificity, the rapid, sensitive, non-radioactive, 
easily performed assay method of this invention is useful in 
determining active TGF-E concentrations in complex solutions. 

10 Thus, the present invention overcomes the limitations of 10 j 

existing methods used to quantify, the amount of TGF-S in a } 
liquid sample. This invention contemplates a method for \ 
quantifying the amount of TGF-fc in a sample using a system 
comprising a TGF-5 responsive cell containing an expression 

15. vector having a TGF-fc response element and an indicator 15 
molecule. Following TGF-S induction, transcription results in 
the expression of an indicator molecule, the amount of which 
allows for the measurement of the amount of TGF-£ responsible 
for the induction. 

20 TGF-S receptor-bearing cells are transfected with a TGF-S '20 

responsive expression vector of this invention, and are 
subsequently exposed to TGF-S whereupon the TGF-S receptor- 
bearing cells activate the TGF-S response element in the vector 
which results in the concomitant expression of the indicator 

25 polypeptide. Ttie resulting expressed indicator polypeptide is 25 :: 

then measured in a manner depending upon the indicator 
polypeptide employed. 

The measured indicator polypeptide resulting from 
activation by TGF-S in the test liquid sample is then compared 

30 to a standardized reference curve produced using known amounts 30 

of TGF-S. I 
In particular, one embodiment of the invention % 
contemplates a method for quantifying the amount of TGF-S in a 
liquid sample, which method comprises: 

35 ' (a) incubating the liquid sample together with eucaryotic 35 
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cells that contain a TGF-S responsive expression vector having 
a gene encoding an indicator polypeptide for a predetermined 
time period sufficient for the eucaryotic cells to express a 
detectable amount of the indicator polypeptide; 
5 (b) measuring the amount of the indicator polypeptide 

expressed during the time period; and 

(c) determining the amount of TGF-£ present in the sample 
. by comparing the measured amount of the indicator polypeptide 
against a reference curve. 

10 Preferably, the reference curve represents a quantitative 

relationship derived from a series of measured amounts of 
indicator polypeptide produced from a series of known 
concentrations of TGF-fc. 

The standardized reference curve is obtained from parallel 

15 assays performed by exposing similarly trans feet ed cells to a* 
range, usually in serial dilution, of known (measured) amounts 
of one or more of the known TGF-fi isoforms. The resulting 
expressed indicator polypeptide is then determined by direct 
detection of the indicator polypeptide. A reference curve is 

20 then generated by plotting the measured amount of expressed 
indicator polypeptide against the known range of inducing 
amounts of TGF-S. The amount of unknown TGF-S in the test 
liquid sample is then determined by extrapolating the measured 
amount of test indicator polypeptide to the reference curve. 

25 The use of standard curves in quantifying the amount of 

protein in a liquid sample in general has been described by 
Lowry et al . , J. Biol . Chem. , 193:265-275 (1951), the 
disclosure of which is hereby incorporated by reference. As 
shown in the Examples herein, the TGF-& assay of this invention 

30 allows for the measurement of TGF-fc from the expression and 
subsequent detection of an indicator polypeptide from- a 
concentration range from less than 5 picograms/rnl (pg/ml) 
equivalent to 0.2 pM up to 10 ng/ml equivalent to 40 pM (or 
0.4 nM) . The dose-dependent response to TGF-S is linear 

35 between 0.2 pM up to 100 pM depending on the assay conditions. 
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As described further herein, any of a variety of indicator 
polypeptides can be utilized in the present methods, and the 
invention is not to be construed as limited to any particular 
indicator polypeptide. However, a pref erred_.embodiment 
.5 utilizes a chemiluminescent molecule, more preferably 

luciferase, as the indicator polypeptide, and therefore the 
examples herein using luciferase are to be considered exemplary 
of all indicator polypeptides and of preferred embodiments . 
The level of. expressed luciferase is easily and conveniently 
0 measured using a luminometer as described herein. 

In another embodiment of the present invention, the assay 
method for quantifying TGF-fc in complex solutions is practiced 
generally as described above, but with the additional use of a 
neutralizing anti-TGF-S monoclonal antibody admixed with the 
5 test liquid sample in assays run in parallel to untreated test 
liquid samples as described in Example 3B. These control 
assays are used to determine if other molecules are present in 
the test sample that can affect the assay through either 
inhibition or activation of other regions of the TGF-fi response 
20. element. For example, conditioned medium obtained from cell 

cultures and body fluids contain growth factors and DNA binding 
proteins that function as transcriptional activators or 
inhibitors. If a corresponding response element for an 
additional non-TGF-S activator is present in the expression 
25 vector, the binding of the activator to the response element 
may cause enhanced or diminished expression of the indicator 
polypeptide. By antibody neutralization of the TGF-fc in the 
test sample, any residual measured indicator polypeptide can 
then be ascribed to non-TGF-£ activation. 

The shorter TGF-S response elements used in the expression 
vector systems of this invention are less likely "to have non- 
TGF-E response elements as shown in Examples 3E and 3F. Thus, 
the use of parallel antibody control assays to allow for a 
determination of the amount of luciferase produced from only 
TGF-S activation is preferred when using expression vectors 
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having longer response elements or elements likely to exhibit 
responsiveness to transcription factors other that those 
induced by TGF-S. Moreover, while the TGF-fi assay. is not 
generally isoform specific, The assay can be TGF-fi isoform- 
. specific by the use of the appropriate standard reference 
curves and parallel assays with neutralizing antibodies 
immunospecific to a particular TGF-& isoform species, thereby 
allowing for quantification of unique TGF-E isoforms. 

Thus, in another embodiment of the invention, a method 
for quantifying the amount of transforming growth factor-S 
(TGF-&) in a liquid sample is contemplated, the method 
comprising; 

(a) providing , in eucaryotic cells capable of expressing 
an indicator molecule, a plasmid comprising, in the direction 
of transcription, a regulatory region that includes at least 
one TGF-S inducible response element that is op.erably linked to 
a promoter, and a structural region downstream of the promoter, 
where the response element is capable of inducing dose- 
dependent indicator molecule activity and where, the structural, 
region codes for the indicator molecule; 

(b) incubating the liquid sample with the eucaryotic 
cells for a predetermined time period sufficient for the 
eucaryotic cells to express a detectable amount of the 
indicator molecule; 

(c) measuring the amount of the indicator molecule 
expressed during the time period; and 

(d) comparing the measured amount of the indicator 
molecule produced in step (c) with the amount of indicator 
molecule produced in a control assay performed according to 
steps (a) through (c) by treating the liquid sample with an 
anti-TGF-E antibody to obtain a net measured amount of the 
indicator molecule induced by TGF-fc. 

The use of a monoclonal antibody specific for TGF-S 
provides particular advantages in practicing the invention. 
First, one can use a variety of TGF-S response elements, 
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including those which exhibit responsiveness to factors in 
addition to TGF-fc, which activity is . subtracted out by the use 
of the control data obtained using the antibody treatment . 
Second, one can correct for spurious induction or inhibition of 
a TGF-S response element by factors other than TGF-fc. The 
analysis of comparative data (comparing) produced by conducting 
the present method both with and without anti-TGF-S antibody 
for the purpose of determining the level of TGF-fc in a liquid 
sample, can be conducted by a variety of statistical methods 
that are not to be construed as limiting to the invention. 
Exemplary comparative analyses are described in the Examples. 

Contemplated for use with any of the above TGF-S assay 
methods of this invention are plasmids having identifying- 
characteristics of plasmids on deposit with ATCC having the 
ATCC Accession Numbers 75627, 75628 and 75629. Also 
contemplated are eucaryotic cells that contain the TGF-E 
response element having the nucleotide sequence in SEQ ID NO 11 
where the cells correspond to cells on deposit with ATCC having 
the ATCC Accession Number CRL 11508. In' one embodiment, the 
use of stably transformed eucaryotic cells are contemplated. 

The invention describes plasmids for use in the methods 
that comprise a nucleotide sequence corresponding to nucleotide 
sequences listed in SEQ ID NOs 1-10. TGF-S inducible response 
elements that comprise a nucleotide sequence corresponding to 
nucleotide sequences listed in SEQ ID NOs 11-17 are also 
described.. Contemplated promoter nucleotide sequences are 
listed in SEQ ID NOs 18 and 19. 

A further embodiment of the methods of the invention are 
eucaryotic cells that are stably transformed cells containing a 
plasmid having a gene encoding a selectable marker for the 
selection of said stably transformed cells. The invention 
describes such plasmids having nucleotide sequences listed in 
SEQ ID NOs 1-6. The invention further describes a stably- 
transformed eucaryotic cell on deposit with ATCC having ATCC 
Accession Number CRL 11508 containing the TGF-fc response 
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element having the nucleotide sequence in SEQ ID NO 11. 

An additional embodiment are eucaryotic cells that are 
transiently transformed cells with plasmids corresponding to 
the nucleotide sequences listed in SEQ ID NOs : 7~10. 
5 The use of stably transformed cells is particularly 

preferred because it provides uniformity and reproducibility to 
the cell based assay without the need for additional controls 
for the efficiency of transformation typically associated with 
methods using transient transformation. Stably transformed • 
cells .do not require the use of an internal standard for 
transformation efficiency, and all of the cells utilized are 
typically uniformly transformed. Furthermore, the methods do 
not require the additional step of transforming the cells 
transiently because the stably transformed cell line is already 
15 available. 

The invention describes quantifying the amount of TGF-S in 
a body fluid, in culture medium, in a tissue extract, and in 
'the like liquid samples. A further preferred embodiment is the' 
determination of the amount of a specific isoform of TGF-fi, 
specifically TGF-S1, TGF-E2 or TGF-E3 , in a liquid sample. 

In a preferred embodiment, this invention describes the 
use of any eucaryotic host cell that contains a TGF-fc receptor 
and is capable of inducing a TGF-fi response element upon 
activation by TGF-G. Exemplary assays for measuring activation 
by TGF-S and induction of a TGF-S response element are 
described herein and can be used to identify candidate host 
cells suitable for use in the present diagnostic methods . A 
preferred host cell is a mammalian cell. Preferred mammalian 
cells include mink lung epithelial (MLE) cells, particularly 
clone C32 from MLE cells, HeLa cells, Chinese hamster ovary 
(CHO) cells, Hep3B cells, GM7373 cells, NIH 3T3 cells, and the 
like cells. 

Conditions for incubating a eucaryotic cell in the present 
methods are the same as general ceil culture methods. Typical 
cell culture media for culturing and incubating eucaryotic 
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cells include alpha -MEM, Eagle's MEM (having non-essential 
amino acids), RPMI 1640 and Dulbecco's modified MEM (DMEM) , all 
which are well known in the art. The. culture-medium preferably 
contains 0.5 to 2 % (v/v) serum, preferably a fetal calf or 
5 fetal bovine serum (FCS or FBS) . Cell culture conditions 5 
include the use of cells plated at a density of about 0.8 to 
about 3.2 x 10 4 cells per well of a 96-well tissue culture 
plate, preferably about 1.6 x 10 4 cells per well. Cells are 
typically plated, at the indicated density, and allowed to grow 

10 until they reach a confluence density of from about 70% :0 
confluent to about 1 day post-confluent, but should preferably 
be allowed to grow after plating for a time period sufficient 
for the cells to express detectable levels of TGF-S receptor, 
which time period is typically about 0.5-24 hours, preferably 

15 about 1-5 hours, and preferably is about 3 hours. 15 . 

After plating and culturing, the eucaryotic cells are 
incubated under culturing conditions with culture medium that 
includes a predetermined volume of a liquid sample believed to 
contain TGF-E. The . incubation time period is a time sufficient 

20 . for any TGF-S present in the liquid sample to interact with the 20 
eucaryotic cell TGF-S receptor and thereby induce the TGF-S 
response element and express the indicator polypeptide. The 
time required for the expressed indicator polypeptide to 
accumulate to detectable levels will vary with the choice of 

25 indicator and method of detection, and can be predetermined. 25 
However, typical incubation times for contacting the cell with 
the liquid sample can range from 2 to 24 hours, preferably 
about 6 to 22 hours, more preferably 10 to 20 hours, and 
particularly about 14 hours. Particularly preferred culturing 

30 and incubation conditions for use in the present methods are 30 
described in the Examples. 

The detection of TGF-& in liquid samples such as body 
fluid or tissue extract samples is useful in following the 
levels of TGF-S in patients experiencing a variety of 

35 conditions where the TGF-E level is important to the clinician. 35 
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For example, TGF-fi levels are significant in diseases 
characterized by excessive fibrosis such as hepatic fibrosis 
and the like, in proliferative and in conditions where there is 
an increase in collagen expression, and the like conditions 
5 where TGF-S is believed to participate. In addition, there are 
many therapeutic uses of TGF-S, and therefore, the present 
assay methods are useful for measuring the therapeutic fate of 
administered TGF-S in patients being treated therapeutically 
with TGF-S. 

F. Diagnosti c Methods and Kits 

The present invention also contemplates a diagnostic 
system in kit form for assaying the amount of TGF-S in a liquid 
sample . according to the present methods. The diagnostic kit 
15.. contains, in an amount sufficient for at least one assay, a 
eucaryotic cell of this invention useful for practicing the 
diagnostic methods for detection of TGF-S. 

The kit can further contain a packaging material . 
Packaging material can include container fs) for storage of the 
20 materials of the kit, and can include a label or instructions 
for use. 

The kit can additionally contain an. aliquot of reference 
TGF-S for use in generating a standard reference curve using 
the methods of the invention. . 

25 Thus in preferred embodiments, a diagnostic kit includes, 

in an amount sufficient for at least one assay, the following: 
(a) packaging material; (b) eucaryotic cells contained within 
the packaging material, where the cells are capable of 
expressing an indicator molecule and containing a plasmid 

30 comprising, in the direction of transcription, a regulatory 
reg n that includes at least one TGF-S inducible response 
element that is operatively linked to a promoter, and a 
structural region downstream of • said promoter, where the TGF-S 
response element is capable of inducing dose-dependent 

35 indicator molecule activity and the structural region coding 



WO 95/19987 PCTAJS95/01153 

■ -46- 



w 



for said indicator molecule; and (c) an aliquoc of TGF-S 
contained within said packaging material, where the TGF-S is 
used for generating a reference curve as described herein 
representing a measured amount of the indicator molecule 
5 produced from a known concentration of TGF-E. 5 
As used herein, the term "packaging material" refers to a 
solid matrix or material such as glass, plastic, . paper , foil 
and the like capable of holding within fixed limits eucaryotic 
cells and an aliquot of TGF-S. Thus, for example, packaging 

10 material can be a plastic vial used to contain eucaryotic cells 10 
in growth medium to which liquid samples can be added for 
activating the TGF-E responsive plasmid within the cells . 
Packaging material can also be a glass vial in which an aliquot 
of TGF-E is contained for use in generating a reference curve, 

15 the latter of which is described in Section E. 15 
As used herein, an "aliquot" of TGF-E refers to an amount 
of TGF-S sufficient to generate a reference curve of this 
invention. In preferred embodiments, the aliquot of TGF-E is 
provided in the form of a substantially dry powder, i.e., in 

20 lyophilized form, for subsequent reconstitution or in the form 20 
of a solution, i.e., a. liquid dispersion. Preferably the 
amount of powdered TGF-S is in the range of 25 nanograms (ng), 
more preferably 125 ng to 625 ng, and most preferably 250 ng. 
Preferably the amount of TGF-S in liquid solution is in the 

25 range of 1 to 50 nanomolar (nM) , more preferably 5 to 25 nM and 25 
most preferably 10 nM. Preferred serial dilutions of TGF-E used 
in generating the reference curve are described in Section E. 
The TGF-E provided in the kit preferably includes each of the 
three TGF-S isoforms as described in Section B. 

30 The term "indicator molecule or indicator polypeptide" as 30 

used in this invention and described in Section Dl refers to a | 
molecule encoded by a reporter gene, the expression of which in \ 
the expression vectors of this invention, results in a. 
detectable measurable protein, polypeptide, enzyme and the ! 

35 like, f 35 
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In preferred embodiments, the packaging material includes 
a label indicating that eucaryotic cells containing TGF-& 
responsive expression vectors can be used for determining the 
amount of TGF-E in a liquid sample that includes the steps of 
5 (a) incubating the cells with the selected liquid" sample; (b) 
measuring the amount of the induced indicator molecule; and (c) 
comparing the amount of measured indicator molecule with a 
reference curve. Thus, the packaging material contains a label 
that is a tangible expression describing the methods of this 
10 invention as described in Section E. of using plasmid- 

trahsformed eucaryotic cells for quantifying the amount of TGF- 
£ in a test liquid sample. 

The packaging materials discussed herein in relation to 
the kit of this invention are those customarily utilized in 
15 kits or diagnostic systems. Such materials include glass and 
plastic, the latter of which include polyethylene, 
polypropylene and polycarbonate, bottles, vials, plastic and 
plastic-foil laminated envelopes and the like. 

The eucaryotic cells transformed with the TGF-& responsive 
20 expression vectors, of this invention are cells that express 

TGF-S receptor on their cell surface as described in Section E. 
All normal cells and most all neoplastic cells have cell 
surface membrane receptors also referred to a binding proteins 
for TGF-fi. ■ For review, see Tucker et al . , Proc .. Natl . Acad. 
25 Sci, , USA, 81:6757-6761 (1984) and Frolik et al . , J. Biol. 

Chem,, , 259:10995-11000 (1984). The receptors have previously- 
been described in Section E. Preferred cells for use with the 
TCF-& assay kit include mink lung epithelial cells (MLE cells), 
HeLa cells, Chinese Hamster Ovary cells, Hep3B cells, GM7373 
30 cells and NIH 3T3 cells, with the C32 clone from the mink lung 
epithelial cells being the most preferred cell line. 

In preferred embodiments, the eucaryotic cells are 
transformed with the expression vector plasmids described in 
Section D have a nucleotide sequence that corresponds to a 
35 sequence in SEQ ID NOs 1-10. Contemplated for use in the kit 
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are stably and transiently transformed eucaryotic cells. As 
described in Section Dl, for preparing stably transformed 
eucaryotic cells, the plasmids corresponding to SEQ ID NOs 1-6 
are preferred for use. A further preferred eucaryotic cell for 
use in the kit is the Hep3B cell line co-trarisrected with 
pl500Luc and RSVneo for preparing stably transformed cells that 
have been deposited with ATCC having the ATCC Accession Number 
CRL 11508 and identified by the designation "LUCI". For 
preparing transiently transformed eucaryotic cells, the 
plasmids corresponding to SEQ ID NOs 7-10 are preferred for 
use. 

In preferred embodiments, eucaryotic cells for use with 
the kit contain a plasmid having the identifying 
characteristics of a plasmid on deposit with ATCC having the 
Accession Numbers 75627, 74628 and 75629 as described in 
Section C. 

The kit of this invention further includes an anti-TGF-B 
antibody for use in a parallel control assay for determining 
the amount of indicator molecule produced other than by TGF-S 
induction. Preferred anti-TGF-S antibodies are anti-TGF-Bl, 
anti-TGF-E2 or anti-TGF-E3 monoclonal antibodies commercially 
available from Genzyme Corp., Cambridge ,■ MA . 

Preferred diagnostic assays accomplished with the kit 
performed as. described herein are for the quantitation of the 
amount of TGF-6 in a liquid sample. A liquid sample can 
include an isoform of TGF-£, specifically TGF-S1, TGF-S2 or 
TGF-S3 . a liquid sample further includes any body fluid, 
culture medium and a tissue extract that may contain unknown 
quantities of TGF-S. Thus, the liquid sample includes the body 
fluids, serum, plasma, whole blood, lymph fluid, synovial 
fluid, follicular fluid, seminal fluid, amniotic fluid, urine, 
"spinal fluid, saliva, sputum, tears, perspiration, mucus and 
the like. Culture medium includes culture supernatant, also 
referred to as conditioned medium, collected from cells 
maintained in tissue culture as described in Example 3B. 
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Tissue extracts also encompass extracts of cells, referred to 
as cellular extracts. In addition, organs such as placentas 
can be obtained and extracted with well known procedures to 
prepare placental extracts. Extracts can also be obtained of 
any body organ or portion thereof, tissue or cells, including 
normal, tumorigenic, and malignant cells. This is generally 
accomplished by surgical means, i.e., by biopsy samples 
including needle aspirates, tissue scrapings, or freshly 
dissected tissues and the like. Extracts are the collected 
samples are then prepared by means including homogenization in 
. lysis buffers, including detergents such as NP-40. Triton X- 
100, and the like. Common methods include using potters, 
blenders, ultrasound generators, and dounce homogenizers . 

15 EXAMPT.Fg 

The following examples relating to this invention are 
illustrative and should not, of course, be construed as 
specifically limiting the invention. Moreover, such variations 
of the invention, now known or later developed, which would be 
within the purview of one skilled in the art are to be 
considered to fall within the scope of the present invention 
hereinafter claimed. 

1 - - Prepamipn Of Rxnrp^inn V 0 r rnr c Cont-^nip^ ^ p.* 

25 • BssBonas F1°nrnr- 

A - Sourer Honing Vppt or CaoaLcugfcs nm * 

Preparation of Expression vpr-t- o r c; for <; rf > Ki 0 . 
Transfnrm^inn 

Eucaryotic expression vectors having a regulatory 
region having at least one TGF-g response element derived from 
the 5' promoter end of the human plasminogen activator 
inhibitor type 1 (PAI-1) gene operatively linked to a PAI-1 
minimal promoter and a downstream structural region containing' 
a gene coding for an indicator polypeptide, preferably 
luciferase, were prepared and designated generally as PAI/L 
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eukaryotic expression constructs. Operatively linking refers 
to the covalent joining of nucleotide sequences, preferably by 
conventional phosphodi ester bonds, into one strand of DNA 
whether in single- or double-stranded form. Moreover the 
joining of nucleotide sequences results in the joining of 
functional elements such as response elements in regulatory 
regions with promoters and downstream structural' regions as 
described herein. 

thp •~ preMion Vector constructs of this invention were 

then used for preparing stably transformed cells for use in the i 0 

quantitative TGF-S assav* of ■ - 

•„ Says of thls mention. The expression 

vectors were designed to contain varying lengths and 
arrangements of the TGF-fi response elements from the PAI-1 
Promoter, a neomycin-resistance conferring gene for selection 
and a gene encoding an indicator polypeptide, preferably 15 
-ucif erase, Two starting vectors were required to prepare the 
expression vectors having a neomycin-resistance conferring 
gene. One of these starting cloning plasmid vectors, 

T PreVi ° US * desc "^ * van Zonneveld et 
al " "™d . Sgj IZSA , 85:5525-5529 (1988), the 

disclosure of which is hereby incorporated by reference. 

1} Prennrfttinn-of rin n .j n a Vpr ,. 07 . rl o T nr 

was ori«rin.n J 1 " 6 Pr0m0ter - ;Less "porter gene pl9Luc plasmid 
was originally designed by van Zonneveld et al., Pron. 

^ pCi 85:5525-5529 (1988, to monitor promoter 

activity with a structural region, having the firefly 
luciferase gene to function as a reporter gene, fused to a SV40 
splice and polyadenylation site. The P 19Luc plasmid also 
contained a multiple cloning site preceded by two SV-40-derived 30 

iZTZl^" Sit8S - ^ Pl9LUC PlaSmid W3S — tructed from 
PSVOAL-M5 , a vector described by De Wet et al., Mol , Cell 

^ ?:725 ' 737 ,1987) - ^PSVOAL-M5. was f irsTlIn^rted 

with Hind III and one portion of the plasmid was blunt-ended by I 

filing m the Hind III site s with DNA polymerase I 35 
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large fragment (Klenow) , l igate d to phosphorylated Eco RI 
linkers (New England Biolabs, Beverly, ma,. ^ of the 
resulting fragments, the 621 bp fragment originally containing 
^ the 5' end of the lucif erase gene and the 2718. bp fragment 
originally located on the 5- end of this fragment, were 
isolated. A second portion of the Hind Ill-cleaved pSV0AL-AA5 ■ 
was ligated to a 55. bp polylinker and cleaved with Eco RI The 

ZT n \llll bp fra9ment containins the muiti > ie ***** -it. 

and the P BR322 -derived ampicillin resistance-conferring gene 
was isolated. These fragments were ligated to create the 
circular double-stranded pl9Luc plasmid that contained the 
three fragments in their original orientation but with the 
multiple cloning site in the original Hind III site 

The continuous 6170 bp sense strand, also referred to as 
the codxng strand, nucleotide sequence of an Eco Rl-linearized 
P19LU. vector is listed in the Sequence Listing as SEQ ID NO 
21. The convention adopted for listing the nucleotide 
•sequences of the P 19Luc vector as well as all the expression 
vectors of this invention derived from pl9Luc is to list only 

ItwavsT Strand ° f SaCh VeCt ° r With thS n -l~tide position 1 
always beginning with the middle of the Eco RI site 

specifically the first T nucleotide. 

The Eco Rl-linearized p!9Luc vector contained the 

thi s l 9 ddl iSt E ° f 6lementS reStriCti - ^-ing with 

the 5 middle Eco RI - T - nucleotide position 1 and extending to 

tne 3 end of the vector ending with the middle Eco RI - A - 
iT^li^ POSiti ° n 6170 ,nUClSOtide P«i"ti*» as listed in SEQ 
site * 750^ atSd " Parentheses) - * I restriction 
site (750-755, within the P BR322.derived ampicillin resistance- 
conferring gene (amp,; an Acc I restriction site downstream o* 
the amp gene (2113-2118, ,- two tandem polyadenylation sites 

immediately upstream of the multiple cloning site beginning . 

witn Bam HI (2771-2776, and Hind III (2778-2783,, continuino 

with accent Sph I, PstI, Hinc II/Acc r/Sai I , xba I. Bam HI. 

*na l/sma I. K pn I, Sst I, and ending with Eco RI (2829-2834,- 
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the luciferase gene adjacent co C he Eco Ri site in which are 

Sph I (3522-3527,, and Xba I (4564-4569); an SV40 splice site 

^V',^ ° f ^ ^-.followed I 

5422^ ^ f enylatl ° n Site <- a B-n, HI restriction site (sL- 
5422); and lastly a pst x restrict . on s . te (5962 _ 5967) (5417 

For use in preparing the expression vectors of this 

latc^f T COT " ini " 9 TCF - e response elements. the 

resultant Ch C6mPriSSd °» «*1«<»y region of the 

en h i™ th ^ f ™" - ~- response elements 

The „* egression vectors of this invention 

• ZtZZTl " U5i " S ^ P19LUC ** °» P»-c-oeriveo "aLc 
•low. is that the vectors allow for the construction of TC F « 
responses vectors having a selected regulatory region 

~' "T" " « — ter. ^eref . a„ y 

regulatory region of any length containing one or more TOP e 
response elements can be oaired w,m, 

responsive PAl-l or h.t 7 ^ pr0TOter ' a non-TCF-E 

b„ f „ ► , heterologous HBV promoter as used herein 

-tors w"' "; hat - " P " Wre TCF " E ^ "^n 

ectors that prov.de for the quantitation of inducing TCF-E 

Whrle specific egression vector constructs having the' 

prepare 6 ; i f '^'^ ^ " «-» 
Prepared for use xn this invention, also contemplated are 

.expression vectors having regulatory regions with TSr-l 

IrZZll ela " 8ntS " e eith " lo ""«- dorter, tandemiy 

arranged, reversed, permutations thereof and the li*. 
operat.vely ligaced to a selected promoter. „oreover. in 
addxtxon to the construction methods detailed herein, othe- 
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methods of preparing pl9Luc-derived expression vectors having 
TGF-S response elements and promoters are familiar to one of 
ordinary skill in the art of vector construction and are 
described by Ausebel, et al., In Current Protocols in Molecular 
Biology, Wiley and Sons, New York (1993) and by Sambrook et 
al., Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory, 1989. 

2) Preparation of Expression Vector nlSQOLuc 
One expression vector of this invention, 
designated pl500Luc, was constructed from pl9Luc and a cosmid 
containing the PAI-1 promoter in which TGF-& response elements 
are located. To prepare plSOOLuc, a 1547 base pair (bp) Kpn I- 
Eco RI fragment of the PAI-1 promoter was obtained from a 
. cosmid containing the entire PAI-1 gene (Loskutoff et al . , 
BiPChein,, 26:3763-3768 (1987), the disclosure of which is 
hereby incorporated by reference, and was cloned into the Kpn I 
and Eco RI sites of pUC19, a plasmid available from American 
Type Culture Collection, Rockville, MD with the ATCC Accession 
Number 37254, to create a vector designated pUCEK19. The 
fragment contained the 1442 bp TGF-& response element (SEQ ID. 
NO 11) from the PAI-1 promoter that corresponded to nucleotide 
position -1481 and extended to the nucleotide position -40 
continuous with a 115 bp minimal (non-TGF-S responsive) PAI-1 
promoter sense strand sequence (SEQ ID NO 18) corresponding to 
nucleotide position ^-39 ending with an E , coli DNA polymerase 
filled-in Eco RI site at nucleotide position at +76 as 
described by Bosma et al . , J. Biol. Chem. . 263:9129-9141 
(1988). The. entire 15,867 bp PAI-1 gene sequence including 
significant stretches of DNA that extend into its 5'- and 3'- 
f lanking DNA regions was described by Bosma et al . , " J . Biol , 
Chem, . 263:9129-9141 (1986), and is available in the 
. GenBank™/EMBL" Data Bank with accession number (s) J03764. 
To create a sensitive reporter gene system with a 
regulatory region having the 1442 TGF-fc response element of the 
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. PAI-1 promoter contiguous with the minimal PAI-1 promoter, the 
pUCEK19 .plasmid prepared above was then digested with Kpn I and 
Eco RI and the isolated fragment was. then ligated into the 
multiple cloning- site of a similarly digested pl9Luc. The 
5 resulting vector was designated plSOOLuc. 

3) Preparation of Ex nression Vector d80QLuc 

Another vector, designated p800Luc, was prepared 

for subsequent constructon of p800neoLuc as described below. 
10 . The p800Luc plasmid, having a deletion in the 5" end of the 

PAI-1 construct so that the. 5' end began with the -800 

nucleotide in the native PAI-1 promoter, was prepared by 

digesting the PAI-l-gene-containing cosmid described above with 

Hind III and Eco RI . The actual Hind III-Eco RI digest of the 
15 PAI-1 promoter resulted in a fragment that corresponded to 

nucleotides -799 to +71 bp in the PAI-1 promoter that was 
* " subsequently ligated into a similarly digested pl9Luc vector 

forming a PAI-1 region extending from nucleotide -800 to +76. 

The resulting p800Luc plasmid retained all the features of 
20 pl9Luc with the exception of the insertion of the PAI-l-derived 

regulatory region having a TGF-& response element and a 

promoter. 

The restriction fragments described to prepare pl500Luc 
and p800Luc had an identical 3' end (an Eco RI site at +71 

25 nucleotide of the PAI-1 promoter) and a different 5' end. The 
vectors, pl500Luc and p800Luc, were used for transient 
transformations as they lacked a selectable marker gene. The 
pl500Luc plasmid was also used to prepare stable 
transformations with a second vector as described in Example 

30 1C. In addition, the p800Luc served as the' starting cloning 

construct for the preparation of p800neoLuc as described below. 
The TGF-S response element in the -800 to +7 6 PAI-1 promoter 
region began at -800 and' ended at -40, the nucleotide sequence 
of which is listed in SEQ ID NO 12. The remaining nucleotides 

35 comprised the non-TGF-S responsive minimal promoter in this 



WO 95/19987 



PCIYUS95/01153 



-55- 

* PAI-1 fragment are listed in SEQ ID NO 18. 

• 4) Preparation of Cloning Vect or rrtQT.iir 

An expression vector, designated p39Luc, having 
a promoter for activating transcription of the luciferase gene 
while lacking TGF-E response elements, thereby lacking 
responsiveness to TGF-E, was prepared as described by Keeton et 
. al -< J . BIO] , fhpffl. . 266:23048-23052 (1991). A fragment of the 
PAI-1 promoter (i.e., between -39 and +76, which had been 
determined in the TGF-E assay as described in Example 3A to 
have low basal activity and only minimal response to TGF-E 
(average induction of 2.7-fold), was used as a minimal promoter 
in the constructs for use in quantifying the amount of TGF-S in 
a test liquid sample. Since the minimal promoter sequence 
conferred only a minimal background response to TGF-E as shown 
in Example 3A, the minimal PAI-1 -derived promoter is also 
referred to as being "non-TGF-S responsive". 

Briefly, the p800Luc vector was linearized by digestion 
with Hind III followed by. 5* digestion of PAI-1 promoter with 
Bal-31 slow- exonuclease (International Biotechnologies, New 
Haven, CT) as described by Keeton et al., J . Biol . Ch^m. . 
266:23048-23052 (1991). The digestion was allowed to proceed 
until the -39 nucleotide position of the PAI-1 promoter was 
reached. Thereafter, the linearized and Bal-31 digested 
plasmid was ligated with T4 ligase forming a double -stranded 
circular vector designated p39Luc. 

The resultant expression vector, into which TGF-S response 
elements were subsequently ligated as described in Example 1C, 
contained the PAI-1 minimal promoter nucleotide sequence 
corresponding to -39 to +7 6 of the promoter as listed in SEQ ID 
NO 18. This minimal promoter was operatively linked to and 
continuous with the structural region that contained the 
firefly luciferase gene present in the vector. Since the 
p39Luc cloning vector was derived from p800Luc which itself was 
derived from pl9Luc, the remaining elements and features of the 
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vector were retained unchanged from pl9Luc. The 6229 bp sense 
strand nucleotide sequence of the Eco Rl-linearized p39Luc 
vector is listed in the SEQ ID NO 23. 

The p39Luc cloning expression vector is also obtained by 
5 preparing a double- stranded oligonucleotide sequence 

corresponding to the sequence in SEQ ID NO 18 and ligating it 
into the Hind III/Eco RI multiple cloning site of pl9Luc. The 
overhang from the Hind III/Eco RI digests in. the pl9Luc vector 
is first digested with mung bean nuclease and followed by 

10 ligation with the blunt-ended double-stranded oligonucleotide 
promoter. Other construction methods are well known to and 
easily accomplished by one of ordinary skill in the art. 

The p3 9 Luc vector was useful for operatively ligating 
regulatory regions that contained TGF-S response elements 

15 resulting in an expression vector that was responsive to DNA- 
binding proteins, the result of which was induction of the 
transcription and translation of the indicator molecule, 
lucif erase. TGF-S responsive expression vectors for use in 
practicing this invention having TGF-fc response elements other 

20 than those specified herein are readily constructed through the 
use of either pl9Luc or p3 9Luc starting cloning expression 
vectors . 

5) Prpnaradnn of Clo ning Vector HBVLuc 
25 To create expression vectors having heterologous 

non-TGF-£ responsive promoters instead of having the PAI-1- 
derived minimal promoter described above, a minimal promoter 
construct derived from the Hepatitis B viral promoter (HBV) was 
selected. This promoter contained the nucleotide sequence from 
30 -188 to +145 of the Hepatitis B promoter and showed only a 4- 
fold induction in response to TGF-S. The sense strand of the 
double-stranded nucleotide sequence of the HBV minimal promoter 
is listed in SEQ ID NO 19. This promoter corresponded to the 
nucleotide sequence from -188 to +145 of the Hepatitis B 
35 promoter and showed only 4-fold induction in response to TGF-G. 
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The 64 64 bp sense strand nucleotide sequence of the Eco RI- 
linearized pHBVLuc vector is listed in the SEQ ID NO 25. 

6) Preparation of Expression Vector 

5 pSQQnepLuc. 

For preparing an expression vector for use in 
stable transformations, the neomycin-resistance conferring gene 
from pMAMneo .(Clontech, Palo Alto, CA) was inserted into the 
p800Luc vector containing -800 to +76 of the 5' end of the 

i human PAI-1 gene followed by the firefly luciferase gene. As 

shown in Figure, 1, p800Luc prepared above was first digested 
. with Acc I, repaired to blunt ends with the Klenow fragment of 
DNA polymerase I, and then was isolated. The pMAMneo plasmid 
was digested with Sal I and Eco RI then blunt -ended with 
Klenow. The neomycin-resistance gene containing fragment was- 
then isolated and had the 4302 bp sense strand nucleotide 
sequence listed in the Sequence Listing in SEQ ID NO 20. The 
linearized p800Luc and neomycin-resistance fragment were 
ligated, and one clone with the insert in the correct 
orientation was selected by restriction mapping and designated 
p800neoLuc. The entire Eco Rl-linearized 112 93 bp nucleotide 
sequence of the sense strand of the double-stranded p800neoLuc 
vector is listed in the Sequence Listing in SEQ ID NO 1. DNA 
sequencing was performed by a modification of the dideoxy 
chain-termination procedure with a Sequenase kit (United States 
Biochemical; Cleveland, OH). This clone, purified from large 
scale plasmid preparations via CsCl2 gradients, was used for 
subsequent transf ections . 

Since the p800nebLuc cloning vector was derived from 
p800Luc which itself was derived from pl9Luc, the remaining 
elements and features of the vector were retained unchanged 
. from'pl9Luc. The p800neoLuc vector thus contained the 
neomycin-resistance conferring gene providing for stable 
transf ormants . The p800neoLuc vector also contained an 
operatively .ligated regulatory region that contained TGF-S 
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response element in the sequence corresponding to -800 to -4 0 
of the PAI-1 promoter resulting in an expression vector that 
was. responsive to TGF-fc. With this expression vector 

construct, the induced activation of the transcription and 

5 translation of the indicator molecule, lucif erase, was obtained 

further allowing for the quantitation of the amount of TGF-fc 
responsible for activating gene expression. 

7) Preparation of Clo ning Vector o39neoLuC 
10 To create an expression vector useful for 

constructing TGF-fc responsive vectors that resulted in stably 
transformed cells, the p39Luc cloning vector prepared above was 
linearized as described above for p800Luc and ligated with the 
neomycin-resistance conferring gene fragment from pMAMneo. The 
15 construction of the vector was performed as described in 
Example 1A6) . The resultant p39neoLuc cloning expression 
vector had the Ecq Rl-linearized 10533 bp sense strand 
nucleotide sequence listed in the SEQ ID NO 22. Regulatory 
regions containing TGF-fc response elements were operatively 
20 ligated 5" to the minimal promoter sequence of the p39neoLuc as 
described in Example 1C for the preparation of plasmids for 
transient transformation. 

8) Preparation of Cloning V ector pHBVneoLuc 
25 To. create an expression vector useful for 

constructing TGF-S responsive vectors with a heterologous 
promoter for stably transforming cells, the pHBVLuc cloning 
vector prepared above was linearized as described above for 
p800Luc and ligated with the neomycin-resistance conferring 
30 gene fragment from pMAMneo. The construction of the vector was 
performed as described in Example 1A6) . The resultant 
pHBVneoLuc cloning expression vector had the Eco Rl-linearized 
107 68 bp sense strand nucleotide sequence listed in the SEQ ID 
NO 24. Regulatory regions containing TGF-fc response elements 
35 were operatively ligated 5' to the minimal promoter sequence of 
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the pHBVneoLuc as described in Example 1C for preparing 
plasmids for transient transformation. 

9) Preparation of plSQQneoLuCr 
■ 5 pSQQ/^SneoLuc, pSSneobuc, 

p$74neoLuc, p743neoLuc and p732neoLuc 
Expression Vectors 

The plSOOLuc vector prepared above is similarly 
ligated with the neomycin-resistance gene from pMAMneo to form 

10 plSOOneoLuc. Other PAI-1 -promoter containing expression 
vectors lacking the neomycin resistance gene, p800/636Luc, 
p56Luc, p674Luc, p743Luc and p732Luc, containing smaller TGF~£ 
response elements were prepared as described in Example 1C. To 
create the corresponding neomycin-resistance expression vectors 

15 • for stably, transforming recipient cells, the neomycin- 
resistance gene from pMAMneo is separately ligated with each of 
these five vectors to form expression vectors used for 
, generating stable cell transformations. The five resultant 
vectors having the neomycin-resistance gene inserted are 

20 designated p800/636neoLuc (10697 bp), p56neoLuc (10549 bp), 
p674neoLuc (10558 bp), p743neoLuc (10569 bp) and p732neoLuc 
(10558 bp) and have the respective complete nucleotide 
sequences of the sense strand from the Eco RI -linearized 
double- stranded vectors in SEQ ID NOs 2-6. 

25 Depending on the vector into which the PAI-1 promoter 

fragments were cloned, the designated names either had "Luc" 
alone or "neoLuc" respectively for vectors lacking the neomycin 
j (neo) selectable marker gene or containing it. In addition, 

the plasmids were further designated by the 5' end of the PAI-1 

30 TGF-S response element. For example, five plasmids with 

•shorter TGF-S response elements were thus named p80Q/636neoLuc, 
p56Luc, p674Luc, p743Luc and p732Luc . 

As with all the expression vectors of this invention, the 
operative elements from the original cloning vector p!9Luc, 

35 from which the vectors were all derived, were retained. 
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The above neomycin-resistance containing expression 
vectors were then used in the TGF-S assay method as described 
in Example 3 following transformation of host recipient cells. 

B - Expressi on Vectors for Co - T ransformation of 
TCF-ft RfiSD Qnsivp Victors and a Sfilprf.hip 

• Martyr VPCtOr for STflbljB Trnn sform^inn 

Stably transformed Hep3B cells were also obtained as 
described in Example 2B below through the use of co- 
transfections of a TGF-8 responsive vector lacking a selectable 
marker gene of this invention, specifically the plSOOLuc 
prepared in Example 1A3), with a selectable marker vector, 
RSVneo, available from American Type Culture Collection (ATCC) , 
Rockville, MD, ATCC Accession Number 37198, The stably 
transformed cell line containing plasmid plSOOLuc, designated 
LUCI, was deposited with the ATCC on or before December 16, 
1993 and was assigned the ATCC Accession Number CRL 11508. 

C - EXPr^rnon Vector? for Transient Trans f ormation 

Additional TGF-S responsive expression vectors were 
prepared for use in this invention, in the vectors prepared as 
described herein, the TGF-S response elements having a smaller 
length, thereby providing responsiveness to TGF-S with reduced 
or absent responsiveness to other growth modulators, were made 
by either restriction digestion of the PAI-1 promoter or 
synthesizing double-stranded blunt-end oligonucleotides. The 
oligonucleotide sequences corresponded to preselected regions 
of. the PAI-1 promoter sequence. The resultant TGF-S response 
elements present within a regulatory region were then 
directionally ligated into p39Luc or p39HBV. 

The regulatory region from the PAI-1 promoter" 
corresponding to nucleotide position -800 up to and including 
-636 was obtained by- restriction digestion and had the 
following sense strand sequence: 

' 5 'AAGCTTACCATGGTAACCCCTGGTCCCGTTCAGCCACCACCACCCCACCCAGCACACCTCC 
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AAGCTCAGCCAGACAAGGTTGTTGACACAAGAGAGCCCTCAGGGGCACAGAGAGAGTCTGGAC 
ACGTGGGGAGTCAGCCGTGTATCATCGGAGGCGGCCGGGCA3 1 (SEQ ID NO 13). 
The additional selected regions for preparing oligonucleotides 
included the following sense strand nucleotide sequences with 
the indicated nucleotide positions as present irT'the intact 
PAI-1 promoter: 1) promoter nucleotide, position -56 up to and 
including -41: 5 ' AGTTCATCTATTTCCT3 1 (SEQ ID NO 14) ; 3) 
promoter nucleotide position -674 up to and including -650: 
5 ' GTGGGGAGTCAGCCGTGTATCATCG3 1 (SEQ ID NO 15); 4) nucleotide 
position -743 up to and including -708: 

5 ' CTCCAACCTCAGCCAGACAAGGTTGTTGACACAAGA3 ' (SEQ ID NO 16); and 5) 
nucleotide position -732 up to and including -708: 
5 1 GOCAGACAAGGTTGTTGACACAAGA3 1 (SEQ ID NO 17). The 
complementary sequences to each of the sense oligonucleotide 
sequences were also synthesized to allow for the formation of 
double- stranded oligonucleotides for ligation 5* to the PAI-1 
minimal promoter sequence containing the TATA box. 

The resulting double- stranded oligonucleotides were then 
separately operatively linked to the -39 position of this 
minimal promoter sense strand sequence listed in SEQ ID NO. 18 
present in the expression vector, p39Luc, prepared as described 
in Example 1A4) . The sequences were confirmed by double- 
stranded sequencing methods. 

The resulting- five plasmids with shorter TGF-S response 
elements were thus named p800/636Luc, p56Luc, p674Luc, p743Luc 
and p732Luc. The plasmids, p56Luc, p674Luc, p743Luc and 
p732Luc, have the respective complete sense strand nucleotide 
sequences beginning with the middle T of the Eco RI site as 
previously described" listed in SEQ ID NOs 7-10. The plasmids, 
p674Luc, p743Luc and p732Luc, were deposited with ATCC as 
described in Example 5 and respectively assigned the -ATCC 
Accession Numbers 75627, 75628 and 75629. 

. In similar procedures, five plasmids having a heterologous 
hepatitis B viral promoter, HBV, instead of the PAI-1 minimal 
promoter were prepared with the shorter TGF-S response 
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elements, p800/636Luc, p56Luc, p674Luc, p743Luc and p732Luc. 
The HBVLuc cloning expression vector was prepared as described 
in Example 1A4) . The TGF-S response elements were ligated into 
linearized HBVLuc, prepared as described in Example 1A5), to 
form TGF-S response element-containing plasmids lacking the 
neomycin-resistance-conf erring gene. 

Furthermore, as previously mentioned, the cloning vector 
constructs, pl9Luc and P 39Luc, provide for the operative 
linking of preselected regulatory regions with preselected 
promoters, both of which are not limited to the specific 
constructs described herein and above. Additional TGF-fi 
response elements in varied lengths and arrangements along with 
promoters that provide for the transcription of the reporter 
gene are contemplated for use in this invention. 

2 - . Transformat i on of FucarvoHr ppIIs wit-h P x nr^inn 
vectors containing tcf-k R P ^ P o n sP Ki^ni-* 

A. Recipient airarvnUr Pone 

To identify the cell types most responsive to TGF-S 
in which to transfect the TGF-S responsive expression vectors 
for use in assaying the amount of TGF-S, the vectors prepared 
in Example 1 were transfected as described in Example 2B and 2C 
into recipient cell lines including mink lung epithelial cells 
(MLE cells) (ATCC CCL 64), HeLa cells (ATCC CCL 2), Chinese 
hamster ovary (CHO cells) (ATCC CCL 61). GM7373 (chemically 
transformed, metal bovine aortic endothelial cells or BAEs) 
(NIGMS Human Genetic Mutant Cell Repository, Camden, NJ) , Hep3B 
(ATCC HB 8064) and NIH 3T3 cells. (ATCC CRL 1658). 

B. Stable Transformation 

For preparing stably transfected cells for use with 
expression vectors containing the pMAMneo construct prepared in 
Example 1A, transf ections of mink lung epithelial cells 
(hereinafter referred to as MLE cells to distinguish from the 
TGF-S proliferation assay called MLEC) were performed. The MLE 
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cells were seeded at 7 x 10 5 cells/100 mm dish for 24 hours at 
which point they were transfected with the PAI/L construct, 
p800neoLuc, by calcium phosphate precipitation as described by 
Wigler et al . , Proc. Natl . Ar*H Sci n<^ 76:1373-1376 
5 (1979). Twenty-four hours after transf ection, the"medium was 
replaced and supplemented with 400 (ig/ml of Geneticin. The 
resistant cells were, expanded in mass culture or cloned by 
limiting dilution for further, experiments. Following 
selection, transfected MLE cells were maintained in DMEM 

10 containing 10% fetal calf serum and 250 jig/ml Geneticin (G-418 
sulfate) (Gibco BRL, Grand Island, NY) . 

Stable transformations are also performed as described 
above with the expression vectors, p800/636neoLuc, p56neoLuc, 
p674neoLuc, p743neoLuc and with p732neoLuc, all of which are 

15 prepared as described in Example 1A. 

C. Stable Transformation Obtain ed bv Co- 
transf fiction of c<*^^ * 

For transf ecting 6 wells, 15 micrograms (ug) of 
:0 pl500Luc expression vector prepared in Example 1A2) that did 
not have a neomycin-resistance gene was admixed with 3 jig of a 
plasmid encoding the neomycin selectable marker gene driven 
from a respiratory syncytial virus promoter, RSVneo. The 
RSVneo plasmid is available from ATCC with ATCC Accession 
5 Number 37198. Hep3B cells at a concentration of 6 X 10 5 

cells/well were seeded as described above in Example IB for 24 
hours at which point they were transfected with the PAI/L 
construct, pl500Luc, by calcium phosphate precipitation 
followed by selection with Geneticin. The resultant cell line 
3 stably transformed with pl500Luc, designated LUC I, was 

deposited with ATCC on December 16, 1993 and was assigned the 
ATCC Accession Number CRL 11508. 

Transient Transformer i op 
> For preparing transiently transformed cells 
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containing TGF-fc responsive expression vectors lacking the 
neomycin resistance gene prepared as described in Example 1C, 
Hep3B human hepatoma cells obtained from ATCC (ATCC Accession 
Number HB8064) were maintained in DMEM/HAMs F-12 (Whittaker 
5 Bioproducts, Walkersville, MD) supplemented with 10% fetal 5 
bovine serum (Hyclone Laboratories, Logan, UT) , glutamine, 
sodium pyruvate, non-essential amino acids and 
penicillin/streptomycin (Whittaker) . For transfection 
experiments, semiconf luent cells in 6-well (10 cm 2 . per well) 

10 tissue culture plates (Corning Inc.-, Corning, NY) were washed . io 

twice with serum free media (DMEM/F-12) then incubated in serum 
free media. Separate mixtures (50 ul/well) of lipofectin 
(GIBCO, Grand Island, NY) at a concentration of 12.5 Jig/well 
and DNA vector constructs prepared in Example 1A-1C at a ! 

15 concentration of 2.5 jig/well each in water were added to the- J 15 

cell-containing wells and the plates were incubated for 18 
hours. After lipofection, plates were incubated an additional 
24 hours in the absence or presence of 1 ng/ml TGF-S provided 
by Berlix Biosciences, South San Francisco, CA. The monolayers 

20 were then washed followed by extraction into 0.25% Triton X- 1 20 

100, Each construct was tested with at least 2 independent DNA 
preparations in order to rule out any effects related to 
differences in DNA preparation. For each experiment, two 
independent transf ections were performed with every construct. 

25 25 
3. Method for Quantifying the .Amount of TGF-ft in a 
Sample 

a. The TGF-ft ftssav Method 

The p800neoLuc construct stably transfected into 
30 Hep3B cells was used in the initial characterization of the -30 
assay method as described herein. TGF-8 measurement assays 
performed with cells transiently transformed with the remaining 
expression vectors containing TGF-S response elements are 
presented in Example 4 . 
35 The TGF-S assay allows for the quantification of the 
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amount of TGF-S in a liquid sample, either containing purified 
TGF-E or TGF-S in a heterogeous admixture. The assay system 
provides for the quantification of TGF-fc through the expression 
of an indicator polypeptide, such as lucif erase. When TGF-S 
receptor-bearing cells, transfected with a TGF-S responsive 
expression vector of this invention, are exposed to TGF-E, the 
activation of the TGF-S response element in the vector results 
in the concomitant expression of lucif erase. The resulting 
expressed luciferase is isolated then measured as described 
herein. The measured luciferase resulting from activation by 
TGF-& in the test liquid sample is then compared to a 
standardized reference curve. 

This reference curve is obtained from parallel assays 
performed by exposing similarly transfected cells to a range of 
known measured amounts of TGF-&, one or more of the known TGF-E 
isoforms. The resulting expressed luciferase is then 
determined in a luminometer. A reference curve is then 
generated by plotting the measured amount of expressed 
luciferase against the known range of inducing amounts of TGF- 
S. The amount of unknown TGF-fi in the test liquid sample is 
then determined by extrapolating the measured amount of test 
luciferase to the reference curve. The use of standard curves 
in quantifying the amount of protein in a liquid sample in 
general has been described by Lowry et al., J, Bj,ol, ChCT-, 
193:265-275 (1951), the disclosure of which is hereby 
incorporated by reference. As shown in the Examples herein, 
the TGF-fi assay of this invention allows for the measurement of 
TGF-S from the expression and subsequent detection of an 
indicator polypeptide from a concentration range from less than 
5 picograms/ml (pg/ml) equivalent to 0.2 pM to 10 ng/ml 
equivalent to 0 . 4 nM. The dose-dependent response is -linear 
between 0.2 pM up to 30 pM and even up to 100 pM depending on 
the assay conditions. 

An additional aspect of the assay for quantifying TGF-S in 
complex solutions was the use of neutralizing anti-TGF-fc 
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monoclonal antibodies admixed with the test liquid sample in 
assays run in parallel to untreated test liquid samples as 
described in Example 3.B. These control assays are used to. 
determine if other molecules are present in the test sample 
5 that can affect the assay through either inhibition or 

activation of other regions of the truncated PAI-1 promoter. 
For example, conditioned medium obtained from cell cultures and 
body fluids contain growth factors and DNA binding proteins 
that function as transcriptional activators or inhibitors. If 

10 a corresponding response element for an additional non-TGF-£ 

activator or inhibitor is present in the expression vector, the 
binding of that molecule to the response element may cause 
enhanced or diminished expression of the indicator polypeptide. 
By antibbdy neutralization of the TGF-fc in the test sample, any 

15 . residual measured luciferase can then be ascribed to non-TGF-S 
activation. 

The shorter TGF-E response elements used in the expression 
vector systems of this invention, even including the longer 
P 800neoLuc, are less likely to have non-TGF-fi response elements 

20 that are bound by other DNA-binding proteins as shown in 

Exairples 3C-3F. Thus, the use of parallel antibody control 
assays to allow for -a determination of the amount of luciferase 
produced from only TGF-fc activation is preferred when 
expression vectors having longer response, elements are used. 

25. Moreover, while the TGF-S assay is not isoform specific, using 
the appropriate standard reference curves and parallel assays 
with neutralizing antibodies to the. various TGF-6 species 
allows for quantification of unique TGF-S isoforms. . 

In the assays described herein, the various following 

30 reagents including their sources are listed: recombinant human 
TGF-S1 (rTGF-fcl) (gift from Berlix Biosciences, South San 
Francisco, CA) ; rTGF-&2 and neutralizing monoclonal antibodies 
against TGF-E1 , TGF-&2 and TGF-&3 (Genzyme, Cambridge, MA) ; 
rTGF-S3, recombinant human interleukin-lalpha (rIL-lalpha) and 

35 recombinant, human platelet-derived growth factor-BB (PDGF-BB) 
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. (R&D Systems, ' Minneapolis, MN) ; recombinant human basic 
fibroblast growth factor (bFGF) (Synergen Inc., Boulder, CO); 
epidermal growth factor (EGF) from mouse submaxillary glands 
(Boehringer Mannheim Biochemicals, Indianapolis, IN) ; 
dexamethasone, retinoic acid, and plasmin (Sigma- Chemical * Co. , 
St. Louis, MO); thrombin (Armour Pharmaceutical Co., Kankakee, 
ID ; and hematopoetic factors granulocyte-colony stimulating 
factor (GCSF) , granulocyte-macrophage-colony stimulating factor 
(GMCSF) , stem cell factor, and IL-3 (Amgen, Thousand Oaks, CA) . 

The TGF-& quantification assay of this invention was 
performed as follows: 1.6 x 10 4 stably transfected MLE cells 
per well plated in 96 well tissue culture dishes were allowed 
to attach for 3 hours at 37°C in a 5% CO2 incubator. The 
medium was replaced with the test sample containing unknown 
quantities of TGF-S, DMEM, 0.1% BSA (DMEM-BSA) containing rTGF- 
£1, rTGF-S2, rTGF-£3, IL-lalpha, PDGF-BB, bFGF, or EGF for 14 
hours at 37°C. Time courses of exposure to the samples were 
performed as shown for optimizing the assay as shown below. 
However, iri general, approximately 24 hours after additions of 
the sample to the transfected cells, the cells were observed 
under phase contrast microscopy. At least in one vector- 
transfected cell line, Hep3B cells, the presence of TGF-S in 
quantities at least or greater than 0.1 ng/ml TGF-E in the 
sample was detected visually by the change of morphology and 
density of the cell population.. The untreated cells remained 
organized with cell size decreasing upon confluence until the 
cell borders were no longer visible. In the presence of TGF-S, 
the untreated cell density was never attained and the cells 
were larger, flatter and less organized. 

Following visual inspection, cell extracts were prepared 
and assayed for luciferase activity using the enhanced 
lucif erase assay kit (Analytical Luminescence, San Diego, CA) 
as per the manufacturer * s illustructions . Treated cells were 
first- washed twice with 2 ml phosphate-buffered saline (P3S) 
without Ca+ + and Mg ++ and then extracted with 100 ul of 0.25% 
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~: „ x 100 (cell lysis buffer, Analytical Luminescence). The 
Zl "we 9 nay sLen until the monolayer detached from the 
plastic. The plates were then placed on a rotator at room 

temperature for 20 ^utes cransf erred to a 

e Eighty ul of the resultant lysates were 

Microliaht 1 96-well. Plate (Dynatech Laboratories Inc., 
C^nt 1 y V A , and were analyzed using an ML1000 luminometer 
e h with 100 ul injections of both Substrates A and B 
STalytical Luminescence) . Luciferase activity was reported as 
10 Uve light units (RLU) as measured by the light generated 
over a ten second period. All assays were performed in 
triplicate.. Error bars in the collected data represent the 
. standard error of the mean of the samples. 

To cjuantitate the amount of TGF-* inducing the measured 
-15 amount of luciferase from liquid samples, reference curves were 
prepared from parallel assays performed by expos,* m » 
transfected cells to a range of known measured amounts of 
», one or more of the known TGF-6 isoforms. Serial dilutions 
of the control TGF-S concentrations were prepared from a 1 
20 nanomolar (nM) concentration down to 0.078 picomolar <pM) 
TGF-S assay was performed for each serial dilution and the 
resulting expressed luciferase was then determined in a 
Wncmeir A reference (standard) curve was then ^ratjd 
plotting the measured amount of expressed luciferas ^ nst 
25 each of the known concentrations of inducing amount of TGF ^ 
The amount of unknown TGF-& in the test liquid sample was then 
determined by extrapolating the measured amount of test 
luciferase to the reference curve. 

b. ^irynrv of ftm 3 SE=f **** Method _ 
To identify the cell type most responsive to TGF-fc 
for' use in the methods of this invention, the p800neoLuc 
construct prepared in Example 1A was stably transfected as 
Termed In Lample 2 B into a- variety of cell Unes inc luoing 
MLE cells, HeLa, Chinese hamster ovary (CHO) , GM7373 cells 
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( chemically transformed fetal bovine aortic endothelial cells 
obtained from the NIGMS Human Genetic Mutant Cell Repository . 
Camden, NJ) and NIH 3T3 cells. After treatment of the 
transfected cell lines with recombinantly-produced TGF-B1, 
5 designated rTGF-Sl, the cell ly sates were assayed for 

lucif erase activity and protein content. There was a linear 
relationship between the luciferase activity and the protein 
content of the cell lysates between 0.7 and 14 ug for all of 
the cell lines. Nontransf ected parental cells, demonstrated no 
0 detectable lucif erase' activity . Of the various cell lines, the 
transfected MLE cells demonstrated the greatest sensitivity to 
TGF-S. After cloning the transfected MLE cells by limiting 
dilution, cells from clone 32 (C32) were found to be the most 
sensitive ana were used for all subsequent assays. 
5 C32 cells were sensitive to rTGF-Sl, £2 and S3 in the 

picomolar (pM> to the nanomolar (nM) range as evidenced by 
increased luciferase activity in relative light units (RLU) as 
shown in Figure 2A. All three isoforms, rTGF-Sl ; rTGF-S2 and 
' rTGF-S3 , respectively graphed as closed squares, closed circles 
20 and closed triangles, demonstrated good dose dependant 

responses particularly at low TGF-S concentrations «4 pM: 100 
pg/ml) where the responses were essentially linear (Figure 2B) . 
rTGF-S3 was the most potent inducer of luciferase activity 
consistent with the observation that MLE cells were most 
25 sensitive to this isoform of TGF-S3 as described by van 

Zonneveld et al., ffror Nfftl. &Cfld Pci ■ . USA, 85:5525-5529 
(1988) (see also Figure 6 as described in Example 3E) , 

To further assess the dose-dependent responsiveness of 
luciferase activity by TGF-S induction, the TGF-S assay was 
30 performed with 8 pM of rTGF-Sl, rTGF-S2 or rTGF-S3 in DMEM-BSA 
in the presence (partially filled squares) or absence (open 
' squares) of 100 ug/ml of anti-TGF-Sl. anti-TGF-62 or anti-TGF- 
S3 monoclonal antibodies (Genzyme Corp., Cambridge, MA). As 
shown in Figure 2C, the induction of luciferase activity by 
35 rTGF-Sl, rTGF-S2 and rTGF-S3 was inhibited by the addition of 
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S assay. The cell attachment and incubation times, however, 
did affect the sensitivity, when C32 cells were plated for 2, 
3 or 4 hours prior to the addition of samples, a 3 hour plating 
time appeared to be optimal. Shorter plating times" decreased 
sensitivity, whereas longer times had little effect on the 
subsequent assay. 

Incubation time with the sample also affected the assay. 
After a three hour attachment period, 1.6 X 10 4 C32 cells were 
incubated with various concentrations of rTGF-Sl ranging from 0 
to 50 pM for 6 (closed squares), 14 (closed circles) or 22 
hours (closed triangles) prior to assaying for lucif erase 
activity as shown in Figure 3C. Incubation times of 12-14 hours 
were found to give the best results over the widest 
concentration range. The sensitivity of cells incubated for 6 
hours was not as great at higher TGF-fil concentrations/whereas 
the sensitivity of cells incubated for 22 hours was decreased 
at low TGF-S1 concentrations. There also appeared to be a 
slight decrease in sensitivity to TGF-fi as the cells. were 
repeatedly passaged (>30) . This phenomenon was observed for 
the MLEC assay as well. 

c - Specificity of the tgf-k a s ^ y 

After examining the sensitivity of the assay, 
specificity of the TGF-S assay was then examined. Four known 
inducers of PAI-1 expression, were incubated with G32 cells and 
the luciferase activity determined. The inducers tested 
included fibroblast growth factor (bFGF) (Saksela et al, 
Ce l1 Biol ,. 105:957-963 (1987)), platelet-derived growth factor 
(PDGF-BB) (Reilly et al . , J. Biol rh cm 266:9419-9427 
(1991)), interleukin-1 alpha (rlL-lalpha) (Schleef et al-. , 
B i ol, Cfrpm , , 263:5797-5803 (1988)) and epidermal growth factor 
(EGF) (Seebacher et al., Exp, C*i 1 Rps , 203:504-507 (1992). and 
Sato et al., Exp, CfM , 204:223-229 (1993)). The assay 

was performed as described in Example 3A with DMEK-BSA 
containing rTGF-Bl (closed squares), recombinant human bFGF 
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(closed circles), recombinant IL-lalpha (closed triangles), 
recombinant PDGF-BB (closed triangles) or EGF (open squares) 
ranging in concentration from 0.1 to 500 pM. As seen in Figure 
4A, even at high concentrations of these factors (500 pM) , 
there was little or no induction of lucif erase expression 
"except by PDGF which demonstrated a slight induction. 

Additional inducers of PAI-1, dexamethasone (10~ 7 M) , 
retinoic acid (1 uM) , plasmin (0.1 U/ml), thrombin (1 U/ml), 
and the hematopoetic factors granulocyte colony stimulating 
factor (10 ng/ml; 525 pM) , granulocyte-macrophage-colony 
stimulating factor (10 ng/ml; 690 pM) , stem cell factor (50 
ng/ml; 2.7 nM) and IL-3 (10 ng/ml; 666 pM) , were also tested 
for their ability to induce luciferase expression in the assay 
method of this invention. Only plasmin and thrombin elicited 
minor elevations of luciferase activity that were inhibited by 
the addition of aprotinin or hirudin, respectively. Of the 
molecules tested in the TGF-E cell assay, only the TGF-Es 
demonstrated dose-dependent increases in luciferase expression. 

When these factors were tested in the presence of TGF-E1, 
a slightly different pattern emerged. These assays were 
performed with C32 cells maintained in DMEM/BSA containing 1 pM 
rTGF-El (closed squares) separately admixed with each of the 
growth factors, bFGF (closed circles), recombinant IL-lalpha 
(closed triangles),, recombinant PDGF (closed diamonds) or EGF 
(open squares), ranging in concentration from 0.2 to 500 pM. 
The results, graphed in Figure 4B, show that high 
concentrations (500 pM) of PDGF-BB and rIL-lalpha increased the 
luciferase ativity above that induced by TGF-& alone. bFGF had 
a similar effect that was observed at lower concentrations. 
This induction, maximal at 10 pM bFGF, was abrogated by the 
addition of bFGF neutralizing antibodies, and did not increase 
at higher concentrations (>10 nM) of bFGF. 

Because this enhancement may -have resulted from a bFGF- 
mediated increase in total cell number and/or protein, crystal 
violet staining of parallel cultures and protein assays of the 
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cell lysates was performed. The normalization of the amount of 
protein using these values, however, did not reduce the 
luciferase activity in the bFGF plus rTGF-Sl -treated cultures 
to that of cells treated with rTGF-Sl alone. • interestingly, 
uncloned trans fected MLE cells were less sensitive to bFGF and 
other factors including TGF-S. 

Additional TGF-B assays were performed using the ATCC 
deposited LUCI cell line containing the pl500Luc expression 
vector co-transfected with RSVneo as described in Example 2C to 
determine the specificity of activation of the PAI-1 promoter 
by other cell activating molecules (agents) . The TGF-S assays 
were performed as described in Example 3A with the exception 
that the pl500Luc vector was used instead of the P 800neoLUc 
vector. Controls in these assays included the use of two 
15 .. additional luciferase-expressing vectors that had the 

vitronectin (VN) and respiratory synctial virus (RSV) promoters 
in place of the PAI-1 truncated promoter. The molecules used 
• in the assays included the following: (the source and 

concentrations are indicated in the parentheses) 1) human 
recombinant IL-6 (Boerhringer Mannheim, Indianapolis, IN; 500 
U/ml); 2) dexamethasone (Sigma Chemical Co.; 10~ 5 M) ; 3) TGFE- 
£ (Berlix Biosciences; 1 ng/ml); 4) lipopolysaccharide (LPS) 
(Sigma Chemical Co.; 1 ng/ml); 5) human recombinant alpha 
tumor necrosis factor (TNF) (Boehringer Mannheim; 100 ng/ml); 
25 6) human recombinant IL-1 (Sigma Chemical Co.; 50 U/ml); and 
7) thrombin (NY state Department of Health, Albany, NY; 10 
U/ml) . 

The assays were performed as indicated in Table 1 in which 
the fold induction is indicated as measured by relative light 
units of luciferase that resulted from the activation of either 
the PAI-1, VN or RSV promoters when exposed to the various 
agents . 
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Table 1 





PAI-1 


VN 


RSV 


Control 


IX 


IX 


IX 


IL-6 


2X 


15X 


IX 


Dexamethasone 


IX 


IX 


IX 


11-6 + Dex. 


6X 


26X 


2X 


TGF-5 


147X 


IX 


2X 


LPS 


2X 


• IX 


IX 


TNF 


0.7X 


0.3X 


0.8X 


IL-1 


0.9X 


0.3X 


IX 


Thrombin 


IX 


0.9X 


IX 



The 1500 bp PAI-1 promoter present in the plSOOLuc vector 
was slightly responsive to IL-6, LPS and a mixture of IL-6 plus 
dexamethasome . In contrast, the induction of lucif erase 
expressing in response to activation by TGF-& was 147-fold over 
that seen in the control untreated cells. Furthermore, IL-6 
and IL-6 plus dexamethasone were effective activating agents 
• when used in the presence of a vitronectin promoter. None of 
the agents were significantly effective at inducing expression 
from the RSV promoter. 

These results confirm that TGF-S is the predominant 
activator of the PAI-1 promoter and that the TGF-S assay of 
this invention exhibits remarkable specificity. Thus, the 
assay is valuable in that the measurement of TGF-S that has 
been purified or even TGF-fc present in unknown "quantities in a 
complex solution containing many promoter-specific molecules 
can be readily determined without confounding by contaminants. 
With the added control of pre-treating the liquid samples with 
neutralizing antibodies to TGF-S isomers, the absolute amounts 
of TGF-S as well as isomer type can be determined. * 

D - Effects of Serum for Quant ifying TGF-fl in rhc 
TGF-E Assay Method 

To assess the effects of serum on the quantification 
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of TGF-ft, TGF-E assays were performed in the presence of DMEM- 
BSA containing rTGF-El alone (closed squares), or with 0.5% 
(closed circles) , 1% (closed triangles) , or 2% (closed 
diamonds) calf serum. The rTGF-Sl concentrations in the assays 
5 ranged from 0 to "8 pM. As shown in Figure 4C, serum similarly 
enhanced the induction of the PAI/L construct by rTGF-Sl 
similar to that by purified growth factors as shown in Example 
3C. At low rTGF-Sl concentrations (<1 pM) , addition of 0.5, 1 
or 2% serum had little effect on the luciferase activity. As 
;0 the rTGF-El concentration was increased, the serum-containing 
curves were shifted upwards possibly as a result of growth 
factors such as bFGF in the serum. 

e. Comparison of the TCF-& Assey with the 
15 MLEC Assay and ,the Radioreceptor Assay for 

Quantifying TGF-fe 

Quantification of TGF-S in a defined media (DMEM-BSA) 
.lacking growth factors or serum as demonstrated in Example 3D, 
however, is rarely found in the laboratory. For this reason, 

20 TGF-& assays were also performed in COS, BSM and BAE cell 

conditioned medium (CM) , all of which normally contain latent 
but little, if any, active TGF-S. These samples were tested 
using the TGF-& assay method of this invention in comparison 
with the MLEC (mink lung epithelial cell tritiated thymidine 

25 . uptake cell assay) 

The TGF-fi assay was performed as described in Example 3A 
with rTGF-Sl ranging in concentration from 0 to 40 pM in the 
presence of either DMEM-BSA (closed squares), COS CM (crosses), 
BSM CM (closed triangles) or BAE CM (closed circles) . To 

30 prepare conditioned medium, BAE cells were cultured in alphaMEM 
medium (Bio-Whittaker, Walkersville, MD) containing. 5% fetal 
calf serum. BSM and COS cells were cultured in DMEM 
supplemented with 10% calf serum (Bio-Whittaker) . Conditioned 
medium was prepared by a 24 hour incubation of the indicated 

25 cells with DMEM containing 0.1% pyrogen-poor BSA 
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(weight /volume) (Pierce, Rockford, ID . All media were 
supplemented with L-glutamine (2 mM) , penicillin G. (100 U/ml) 
and streptomycin sulfate (100 *ig/ml) (Irvine Scientific, Santa 
Ana, CA) . 

The MLEC assay was performed essentially as described by 
Lucas et al., In Peptide Growth Factors, Barnes et al. , Eds, 
Academic Press Inc. 198:303-316 (1991). Briefly, 100 ul 
aliguots of the samples were placed in 96-well plates 
containing 10 4 MLE cells per well in 100 ul of assay buffer 
(DMEM containing 0.25% fetal calf serum and 10 mM HEPES) . 
After 20 hours at 37°C, one |iCi of 3 H- thymidine (6.7Ci/mmol, Du 
Pont Co., Boston, MA) in 20 jil of the assay buffer was added to 
each well, and the plates incubated an additional 4 hours. The 
cells were, harvested by incubation with 100 jxl of 0.25% 
trypsin/lml EDTA at 37°C for 15 minutes, transferred onto glass 
fiber filters, and placed into vials containing liquid 
scintillation solution. The amount of radioactivity was 
quantified with a Beckman LS 3801 G-scintillation counter 
(Fullerton, CA) . 

As clearly shown by the data indicated by the unbroken 
lines in Figure 5, both BAE and BSM CM contained factors that 
stimulated thymidine incorporation in the MLEC assay 5-6 fold. 
Only at rTGF-Sl levels greater than or equal to 1 pM was the 
3 H-thymidine incorporation suppressed to a level equal to that 
of non-conditioned medium (DMEM-BSA) . In contrast, COS CM 
contained factors that strongly inhibited 3 H-thymidine 
incorporation, with all three. of these CM, calculation of TGF- 
& concentration would be very difficult using 3 H-thymidine 
incorporation. In contrast, when different CM were used in the 
TGF-S assay as indicated in Figure 5 with the data plotted with 
broken lines, there were also slight changes but these 
differences were much less significant than those seen with the 
MLEC assay. BAE CM, which contains bFGF, shifted the response 
curve to higher values. BSM and COS CM had only minor effects 
on the standard curves. 
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When bFGF (closed diamonds), EGF (open circles), PDGF-BB 
(open triangles), rIL-lalpha (open squares), and the TGF-Ss 
(rTGF-Sl (closed squares), rTGF-E2 (closed circles), and rTGF- 
£3 (closed triangles) were tested for their ability to affect 
3 H-thymidine incorporation into non-transf ected MLE cells in 
the MLEC assay performed as described above, more striking 
effects were observed as shown in Figure 6. The three TGF-S 
isoforms, especially TGF-B3 , decreased 3 H- thymidine 
incorporation as expected. IL-lalpha and PDGF-BB had little 
effect, but bFGF and EGF had strong dose-dependent stimulatory 
effects on 3 H-thymidine incorporation. Such effects can make 
the MLEC assays inaccurate and difficult to analyze. 

. F - Quantitation of Total tt^-r T.* yo f s in ArrS ^^ 0f1 

In order to analyze total levels of TGF-£, BAE CM 
collected after 12 or 24 hours was heat treated at 80°C for 10- 
12 minutes to activate endogenous latent TGF-6 as described by 
.Brown et al., Growth East , 3:35-43 (1990). After cooling, the 
samples were diluted to 5, 10 or 20% of their original 
concentration with DMEM-BSA arid were quantified using the TGF-fi 
assay. TGF-6 concentrations of 23.4±3.4 pM (12 hour CM) and 
122.1±16 pM (24 hours CM) were determined via comparison with a 
rTGF-S standard reference curve generated from plotting the 
detected amounts of lucif erase activity that resulted from a 
range of predetermined amounts of TGF-6 as described in Examole 
3A. - 

The heat-activated CM were also assayed using the highly 
specific radioreceptor assay as described by Kojima et al., 
Cel l , Physiol , 155:323-332 (1993), the disclosure of which is 
hereby incorporated by reference. Briefly, murine AKR-23 
fibroblasts at 1 X 10 5 cells/well were plated in a 24-well 
plate in McCoy's 5A medium (Gibco BRL) supplemented with 5% 
fetal calf serum. The following day, the cells were washed 3 
times with binding buffer (McCoys 5A. 0.1% BSA, 25 m KEPES at 
pH 7.4) and were pre-incubated in 250 ul of binding buffer for 
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1 hour at room temperature. The medium was removed, and the 
cells were incubated for 2 hours at room temperature in a 
mixture of 125 ul of binding buffer containing 50 pM "5l-rTGF- 
£1 and an equal volume of heat-activated (80°C for 10 minutes) 
5 BAE CM or serial dilutions of cold rTGF-61 . The cells were 

washed 3 times with binding buffer, and the bound radioactivity 
was solubilized in cell lysis buffer (Analytical Luminescence) 
and was measured in a Packard Multi-PRIASl gamma counter 
(Meriden, CT) . The radioreceptor assay was sensitive between 
10 0.0004 and 2 nM rTGF-61 . 

In the radioreceptor assay, concentrations of 2411.1 pM 
(12 hour CM) and 128±48.8 pM (24 hour CM) were calculated. The 
essentially identical results quantifying the amount of TGF-6 
in conditioned medium between the TGF-6 assay described above 
15 and the radioreceptor assay verify the accuracy and specificity 
of the TGFrS assay of this invention. 

Thus, a highly sensitive and specific, non-radioactive 
' assay for mature TGF-S has now been developed. When compared 
to the sensitive and widely used MLEC method for measuring TGF- 
20 6 concentration, the TGF-6 assay was more rapid, had comparable 
sensitivity, and a greater detection range. Specificity of 
this assay was also higher as evidenced by its relative 
insensitivity to factors such as EGF and bFGF which can greatly 
affect other assays. The most remarkable example of the TGF-S 
25 assay specificity was observed with COS cell CM which 
completely inhibited the MLEC assay, while having no 
detrimental effects in the TGF-S assay. 

In addition to the TGF-S assay of this invention and the 
•MLEC and radioreceptor assays described herein, other assays 
30 have been used to detect mature TGF-S including anchorage- 
independent growth assays, differentiation-based assays, cell 
migration and plasminogen activity assays, radioimmunoassays 
and enzyme-linked immunosorbent assays. Although all of these 
assays can detect mature TGF-S, the low concentrations of TGF- 
35 S, generally less than 2 pM, generated in many biological 
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systems make many of them impractical without prior 
concentration of the sample that can result in large losses of 
the mature growth factor or even activation of latent 
The TGF-E assay of this invention overcomes these deficiencies 
by being highly sensitive and specific as well as 
nonradioactive. The specificity and sensitivity of the assay 

a «nn th Y eSUlt ° f USln9 3 trUnC9ted ^ promot - b ^ning at 
-800 and extending through 76 of the PAI-1 5- promoter that 

retains two regions responsible for maximal response to t G F-E 
as described by Keeton et al . , j, ^ , 266 : 23048 ; 23052 

(1991). use of the complete PAI-1 promoter and upstream 
elements result in decreased specificity as responsive elements 
for other molecules present in complex solutions may be 
activated or inhibited deleteriously effecting the ability to 
quantify TGF-E. Moreover, the truncated PAI-1 promoter used 
above nas been further fragmented to smaller more specific TGF- 
E response elements as described in Example 4 to enhance 
specialty and increase the sensitivity of the TGF-E assav 
method. 

When the TGF-S assay is compared to the sensitive and 
widely used MLEC assay for Quantifying TGF-E concentrations 
the TGF-E assay was more rapid, had comparable sensitivitv but 
with a greater detection range. Specificity of the assay" was 
also higher as evidenced by the TGF-E 's assay insensitivity to 
growth factors such as EGF and bFGF that have been shown to 
greatly effect other assays. The most striking example of the 
specificity of the TGF-S assay was observed with the COS cell 
Une conditioned medium that completely inhibited the MLEC 
assay while having no detrimental effects in the TGF-E assay as 
shown in Figure 5. 

Although the TGF-E assay is not isoform specific, use of 
the appropriate standard reference curves and addition of 
neutralizing antibodies to the various TGF-E species allows f D - 
quantisation of unique isoforms. While the TGF-E assav o* 
this invention is highly specific, the use of highlv soeciJc 
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neutralizing antibodies to TGF-B was used to verify that no 
. other molecules were present in test liquid samples that may 
have affected the quantitation of TGF-B in the assay. 
Considering its large range and specificity, this rapid, 
sensitive, non-radioactive, easily performed'assay is of 
invaluable use in determining active TGF-B concentrations in 
complex solutions, particularly so with the use of parallel 
assays with neutralizing antibodies to TGF-E in complex unknown- 
samples to verify that no other molecules are present that can 
affect the assay through either inhibition or activation of 
other regions of the truncated PAI-1 promoter. 

. 4 . n^ifvina iSEdS rp1 1 q Transiently Transformed 

wit-h Rvnrfls q ^n VPrKrrs Having BhOXVr EEafflDSIfrS flf 
15 t< hhP pat-1 P r om^Pr Containing TGF-fi FffPPPnse 

Elements 

The regulation of PAI-1 by TGF-S appears to affect a 
number of biological systems and the mechanism of 
transcriptional regulation by TGF-fi has been studied by a. 
number of groups. For example, the autoinduction of the TGF-S1 
promoter suggests a feedback loop designed to amplify the 
response to TGF-S under certain conditions. This response was 
shown to involve specific AP-1 sites. AP-1 is a heterodimenc 
complex of Fos and Jun protein subunits that binds to specific 
DNA enhancer sites which have the consensus sequence TGASTCA 
(SEQ ID NO 26) , where S can be either G or C. AP-1 is believed 
to mediate the transcriptional effects of the tumor promoting 

phorbol esters . 

in contrast to these results, the TGF-S response sequence 

30 in the promoter for type 1 collagen, has been localized to a 
sequence with homology to a nuclear factor 1 (NF-1 ) binding 
site A number of different consensus sequences for NF-1 have 
been described and these include the sequences TGGN7GCCAA (SEQ 
ID NO 27), where N can be either P., C G or T, and TGGCA (SEQ 

35 ID NO 28) . The effect of TGF-fi on the PAI-1 promoter has been 
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studied resulting in the demonstration that the responsive 
j regions contain sequences with homology to the AP-1 consensus 

sequence . 

To determine the role of AP-1 in the regulation of the 
'• . 5 PAI-1 promoter in more detail and to identify smaller TGF-G 
responsive regions with the PAI-1 promoter of p800neoLuc 
expression vector prepared in. Example 1 for use in quantifying 
TGF-S in Example 3, the effect of both TGF-fc and AP-1 on the 
activity of a 25 bp fragment corresponding to the PAI-1 
10 promoter between -674 and -650 in the 5' flanking region was 
evaluated. This fragment contained one of the AP-1 like 
sequences that responded to TGF-fc. The expression vectors for 
use in assessing the requirement for AP-1, including the one 
containing the 25 bp fragment, were prepared as described in 
15 Example 1C. 

a. TGF-S Activation of PAI-1 Promoter Fragments 

AP-1 like sites are located within each of three 
regions of the 5' flanking region of the PAI-1 promoter from 
20 . -87 to -49, from -674 to -636 arid from -740 to -703. 

Oligonucleotides having portions or all of these regions were 
synthesized and cloned into a pUC-lucif erase expressing plasmid 
containing the minimal promoter as described in Example 1C. 
The resultant plasmids were transiently trans fected into 
.25 recipient Hep3B cells as described in Example 2C and evaluated 

i 

for their response to TGF-6 as measured by lucif erase 
expression as described in Example 3A. The plasmid designated 
p56Luc contained an oligonucleotide sequence that corresponded 
to -56 to -41 of the PAI-1 promoter gene (also referred to as 
30 region. A) and conferred a 10-fold induction of measurable TGF-S 
as compared to a 3-fold induction obtained with a plasmid 
expression vector only containing the minimal promoter 
sequence. 

Another plasmid designated p674Luc, deposited with ATCC 
35 and having ATCC Accession Number 75627, contained an 
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oligonucleotide sequence 2 5 bp in length that corresponded to 
-674 to -650 of the PAI-1 promoter (also referred to as region 
B) . This nucleotide sequence conferred a 70-fold induction on 
the minimal promoter. The plasmid designated p743Luc contained 
5 an oligonucleotide sequence 35 bp in length that corresponded 
to -743 to -708 of the PAI-1 promoter (also referred to as 
region C) . This nucleotide sequence conferred a 35-fold 
induction in the promoter. The plasmid designated p732Luc 
exhibited 62-fold induction while the plasmid, p732HBV, having 
10 the hepatitis B virus (HBV) minimal promoter sequence instead 
of the PAI-1 sequence exhibited 47-fold induction. 

This result is in conparisbn to 6-fold basal induction 
from a control plasmid having only the HBV minimal promoter 
without having any TGF-fc response elements. The nucleotide 
15 sequence of the sense strand of the HBV-minimal promoter- 
containing plasmid having or lacking the neomycin selectable 
marker gene are listed respectively in SEQ ID NOs 23 and 24. 
In parallel assays-, the p800Luc plasmid that contained 3 AP-1- 
like sequences conferred greater than 150-fold induction of 
20 TGF-S responsiveness as compared to the minimal promoter 

sequence. The stably transformed p!500Luc similarly resulted 
in -approximately 150-fold induction. These results as well as 
the others presented in the Examples represent the average of 
at least 4 independent experiments, each performed in 
25 duplicate. 

Regions A and C contained only a single AP-1 like sequence 
whereas region B contained 2 AP-1 like binding sequences. 
Thus, oligonucleotides containing AP-1 like sequences from each 
region were able to confer TGF-S responsiveness to a non- 
30 responsive minimal promoter. 

B. Papons ivp npss of the TGF-ft responsive Regions 
A. R and r t-o c-fos /c-iun 

In order to directly test the response of the p56Luc, 
35 p674Luc and p743Luc plasmids to AP-1, they were cotransf ected 
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together into Hep3B cells with plasmids containing the mouse 
genes for c-fos and c-jun under the control of the RSV 
promoter. All three of these regions showed a dose dependent 
response to increasing amounts of c-fos/c-jun, with maximum ' 

seen usin9 °- x " 9/we11 of c - fos and i*. 

Thxs response was dependent on co-transf ection of both plasmids 

i;::iti:: ther c - fos - — — ~ - — - 
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PAI-1 ornnT ' minimal TCF " S r6S P° nsive sequence in the 
PA1 1 promoter reckon from nucleotide position -743 to -708 
• the sequence of which is listed in SEQ id NO 16, two 
oligonucleotides were made, the first from the 3" side of 
regxon C which contained the AP-1 l ike sequence CC2: -723 to 

ll a C r.T P ° ndin9 t0 ^ SeQUenCe in SEQ ID N0 16 f«» 21 to 
36)_«nd the second from the remaining 5- sequence (C3 : -743 to 

17, C ^ rSSPOnding t0 the se ^^ce in SEQ ID NO 16 from 1 to 
17). When , the oligonucleotides were examined for response to 

r TC nt ti c2 ° r c3 seguence sh ° wed maxiinai 

1" TCF " (10 " fOld and 3 - fold induction, respectively, as 
compared to region C itself (25-fold induction, . This result 

loITdt th3C 3 POrti ° n ° f 3 TCF " S -^sive binding site 
located between -723 and -727 was deleted. The 5 • side of C2 

to 72« n vr ° 9rSSSiVely to -elude bases between -723 

the'^F i induCtion ^ ^und that this did not improve 

alir ! " SSPOnSe - H ° Wever when region was extended 

another 4 bp there was a dramatic increase in the TGF-E 
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To assess the role of the AP-1 site compared to the 
5' TGF-S responsive site, the response of the minimal promoter 
having the 5' flanking region of the PAI-1 promoter from -39 to 
+76 to direct stimulation with c-fos/c-jun was determined. It 
5 showed 10-fold induction with AP-1 compared to only 3-fold 

induction with TGF-S. When C5 was tested in a similar manner 
there was only a 2-fold increase above the vector background 
induced by c-fqs/c-jun compared to a greater than 20-fold 
increase above background seen with TGF-S (C5 itself showed 63- 

10 fold induction). Thus, although the wild type AP-1' site in C5 
was only a relatively poor responsive sequence to c-fos/c-jun, 
this region still showed a strong response to TGF-S. The AP-1 
site was therefore mutated to produce a consensus AP-1 sequence 
(TGACACA to TGAGTCA, SEQ ID NOs 29 and 30, respectively) and 

15 the response of mutant to both c-fos/c-jun and TGF-S was 

compared. This mutation increased the AP-1 response from 19- 
fold to 105-fold but did not improve the TGF-S response. In 
fact, a consistent decrease was seen in the TGF-S response 
• following this mutation (63-fold induction with TGF-S for the 

20 wild type AP-1 like site to 30-fold for the consensus AP-1 
site) , 

The AP-1 like site was then mutated by changing the 
critical TGA bases, a change shown by others to decrease the 
activity of the AP-1 binding site. Although this mutation had 
25 the expected effect of abolishing the AP-1 response, it did not 
completely abolish the response of this construct to TGF-S (10- 
fold induction with c-fos/c-jun {i.e., vector background] but a 
13-fold induction with TGF-S [i.e., 5-fold above vector 
background] ) . 

30 This result once again suggested that the 5' portion of C5 

(-732 to -708) was more critical than the AP-1 like .sequence in 
'mediating the TGF-S response. To further test this hypothesis, 
4 bp between -728 and -732 was mutated (the resultant mutated 
vector designated C8) since the previous deletion results 
35 suggested that this sequence was critical to the TGF-S 
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response. A 3 bp sequence between -72 6 and -728 was also 
mutated (the resultant vector was designated C9). As expected, 
both of these 5' mutations caused dramatic reductions in the 
response of C5 to TGF-fc (60-fold to 4 -fold for both C8 and C9 ) . 
These changes had little effect on the AP-1 response which 
decreased only slightly from 19-fold to 13-fold. A double 
mutation of both of these sites was also created and this 
abolished both the TGF-B and the AP-1 activity. 

E. Heterologous Promoter Tndnrring 

To test whether the 25 bp oligonucleotide from the 
PAI-1 promoter region C5, -732 to -708 (SEQ ID NO 15), was able 
to activate a heterologous promoter, it was cloned into a 
hepatitis B viral promoter, the latter of which had the 
nucleotide sequence from -188 to +145 of the viral promoter 
(SEQ ID NO 19). Control experiments found that this construct 
alone showed 28-fold induction with fos/jun. However, the 
, viral promoter showed only 4-fold induction with TGF-S. Thus, 
even though the hepatitis B viral promoter had active AP-1 like 
sites, these were not sufficient for a strong TGF-S response. 

The region between -708 and -732 of the PAI-1 promoter 
(C5) was then cloned into the viral promoter and the resultant 
construct was tested as above. The 25 bp PAI-1 fragment was 
able to dramatically increase the TGF-G response of the viral 
promoter from 4-fold to 47-fold but did not alter the AP-1 
response (25-fold compared to 28-fold) . Finally, mutation of 
bases between -732 and -728 of the PAI-1 promoter 
oligonucleotide dramatically reduced the TGF-& induction of 
this fragment but did not lower the response to AP-1. 

F. AP-1 -Independent TCF-ft indurHnn 

To determine if the 5' -732 to -708 nucleotide 
sequence from the PAI-1 promoter could function independently 
of the AP-1 site in the TGF-S response, a 15 bp oligonucleotide 
containing bases between -732 and -718, corresponding to the 
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nucleotide sequence from position 1 to 15 in SEQ ID NO 17) 
(which excludes the AP-1 like site) was cloned into a pUC- 
lucif erase expression vector having the minimal PAI-1 promoter. 
This 15 bp sequence was able to confer 20-fold induction with 
TGF-S with the minimal PAI-1 promoter and did not show, any AP-1 
activity. 

With regard to the AP-1 like sites involved in this 
response, unlike the consensus sequence for AP-1 ( TGASTCA , 
where S is G or C (SEQ ID NO 26), the most active sequences 
from the PAI-1 promoter all have the sequence TG A ( N ) AC A where N 
is either A, C, G or T (SEQ ID NO 31) (PAI-1 promoter: -717 to 
-711 = TGACACA (SEQ ID NO 29); -659 to -653 = TGATACA (SEQ ID 
NO 32). It is possible that the.T to A substitution may affect 
the binding affinity enough to preferentially bind another 
protein other than c-fos/c-jun. This is consistent with the 
functional data on the AP-1 like site of the PAI-1 promoter 
(between -711 to -717) which indicates that the wild type 
sequence is a poor AP-1 binding site and yet is still important 
in the TGF-fi response. 

The mutation and deletion data of the 25 bp sequence from 
the wild type PAI-1 promoter (-732 to -708) suggested that the 
5' side of the oligonucleotide may contain a second binding 
site of importance in the TGF-S response. In fact this region 
appeared to be more critical than the AP-1 sequence since 
mutation of this region almost completely abolished the TGF-E 
response even, though the AP-1 region was intact. When this 
sequence alone was evaluated, it was able to act independently 
of the AP-1 site and promote strong TGF-S induction of the 
normally unresponsive minimal promoter. However, the full TGF- 
£ response was dependent on the functional activity of both the 
AP-1 like site and the 5' site. When the sequence of the 5' 15 
bp sequence was compared to the other region of the PAI-1 
promoter which also showed strong TGF-fc induction (region B = 
60-fold) , a sequence was found that was common to both of these 
regions (CCNTGTNT, where N is either A, C, G or T (SEQ ID NO 
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33)) . 

In summary, the TGF-fi response of the PAI-1 promoter has 
been localized to specific AP-1 like sites. However, the full 
TGF-S response of this region of the PAI-1 promoter is 
dependent on the interaction of two binding sites". The first 
site has homology to an AP-1 site but does not appear to bind 
AP-1. While this, site is not essential it is required for the 
full TGF-fi induction from this. region. The second site, 
located 5' to the AP-1 site, appears to be critical in the TGF- 
£ response. This site is 15 bp in size and contains a motif 
that is present in both active regions of the PAI-1 promoter as 
well as in the most responsive region of the TGF-fc promoter. 
This novel sequence does not appear to match any previously 
described transcription factor binding sites and may represent 
. a new and specific binding site which is critical for a strong 
TGF-B response. 

.5. Deposit of Materials 

The plasmids, p674Luc, p743Luc and p732Luc, were deposited 
on or before December 16, 1993 \ with the American Type Culture 
Collection, 1301 Parklawn Drive, Rockville, MD, USA (ATCC) and 
assigned the' respective ATCC Accession Numbers ATCC 75627, ATCC 
75628 and ATCC 75629. The cell line, Hep3B, stably transfected 
with plasmid pl500Luc for a transformed cell line designated 
LUCI, was also deposited on or before December 1*6, 1993 with 
ATCC and assigned the ATCC Accession Number CRL 11508. The 
deposit thus provides plasmids and a stably transfected cell 
line containing plasmid pl500Luc. These deposits were made 
under the provisions of the Budapest Treaty on the 
International Recognition of the Deposit of Microorganisms for 
the Purpose of Patent Procedure and the Regulations thereunder 
(Budapest Treaty) . This assures maintenance of viable plasmids 
and cell lines for 30 years from the date of deposit. The 
plasmids and ceil line will be made available by ATCC under the 
terms of the Budapest Treaty which assures permanent and 
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unrestricted availability of the progeny of the culture to the 
public upon issuance of the pertinent U.S. patent or upon 
laying open to the public of any U.S. or foreign patent 
application, whichever comes first, and assures availability of 
the progeny to one determined by the U.S. Commissioner of 
Patents and Trademarks to be entitled thereto according to 35 
U.S.C. §122 and the Commissioner's rules pursuant thereto 
(including 37 CFR §1.14 with particular reference to 886 OG 
638) . The assignee of the present application has agreed that 
if the plasmid or cell line deposits should die or be lost or 
destroyed when cultivated under suitable conditions, they will 
be promptly replaced on notification with a viable specimen of 
the same plasmid or cell culture. Availability of the 
deposited plasmids is not to be construed as a license to 
practice the invention in contravention of the rights granted 
under the authority of any government in accordance with its 
patent laws . 

The foregoing written specification is considered to be 
sufficient to enable one skilled in the art to practice the 
invention. The present invention is not to be limited in scope 
by the plasmids deposited, since the deposited embodiment is 
intended as a single illustration of one aspect of the 
invention and any plasmids that are functionally equivalent are 
within the scope of this invention. The deposit of material 
does not constitute an admission that the written description 
herein contained is inadequate to enable the practice, of any 
aspect of the invention, including the best mode thereof, nor 
is it to be construed as limiting the scope of the claims to 
the specific illustration that it represents. Indeed, various 
modifications of the invention in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description and fall within the scope of 
the appended claims. 
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(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: The Scripps Research Institute 

(B) STREET: 10666 North Torrey Pines Road 

(C) CITY: La Jolla 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP) : 92037 

(G) TELEPHONE: 619 -554- 2937 

(H) TELEFAX: 619-554-6312 

(ii) TITLE OF INVENTION: A NEW SENSITIVE METHOD FOR QUANTIFYING 
ACTIVE TRANSFORMING GROWTH FACTOR-BETA AND COMPOSITIONS 
THEREFOR 

(iii) NUMBER OF SEQUENCES: 33 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 

(v) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US 95/ 

(B) FILING DATE: .25-JAN-1995 

. (vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBERE: US 08/188,227 

(B) FILING DATE: 25 -JAN- 1994 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11293 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL; NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTC 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTGAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG " GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TCCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AG AT AC C AAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 
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TACATACCTC GCTCTGCTAA .TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 
TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

XAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT . 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA CCATTTTT TT CACTGCATTC TAGTTGTGGT- 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 
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TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 
AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 31 so 
CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 324 0 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 
GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 
GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 
ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 
CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 
CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 
TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720' 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTGAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCCTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC. CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 
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4620 
4680 



GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 
CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGACTTGGT TCAGCTGCTG 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGG GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTCTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGGC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 
CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 
TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 
GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 
CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 
AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 
ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTCTCGTC 6120 
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AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 
TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 
AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 
CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA 6AGTAACAGC 
TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 
ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACGCG 
CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 
TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 
GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 
ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 
CAGCTTATAA TGGTTACAAA TAAAGCAATA G CATC AC AAA TTTCACAAAT AAAGCATTTT 
TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 
TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTCC TTTAAAAAAC CTCCCACACC 
TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 
CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 
CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 
CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA CCACCCCACC CAGCACACCT 
CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC TCAGGGGCAC AGAGAGAGTC 
TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC CGGGCACATG GCAGGGATGA 
GGGAAAGACC AAGAGTCCTC TGTTGGGCCC AAGTCCTAGA CAGACAAAAC CTAGACAATC 
ACGTGGCTGG CTGCATGCCT GTGGCTGTTG GGCTGGGCAG GAGGAGGGAG GGGCGCTCTT 
TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA CAGCCCTGGG GGAAAACTTC CACGTTTTGA 
TGGAGGTTAT CTTTGATAAC TCCACAGTGA CCTGGTTCGC CAAAGGAAAA GCAGGCAACG 
TGAGCTGTTT TT T TTTT CTC CAAGCTGAAC ACTAGGGGTC CTAGGCTTTT TGGGTCACCC 
GGCATGGCAG ACAGTCAACC TGGCAGGACA TCCGGGAGAG ACAGACACAG GCAGAGGGCA 
GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG CTATTGGGGT TTGCTCAATT GTTCCTGAAT 
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GCTCTTACAC ACGTACACAC ACAGAGCAGC ACACACACAC ACACACACAT CCCTCACCAA 7740 
GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT TCAGACGGAC TGCCAGAGCC 7800 
AGXCAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC TGCCCACATC TGGTATAAAA 7860 
GGAGGCAGTG GCCCACAGAG GAGCACAGCT GTGTTXGGCT GCAGGGGCAA GAGGGCTGTC 7920 
AAGAAGACCC AGAGGCGGCC CTCCAGCAGC TGAATTGGAG CTGGCATTCC GGTAGTGTTG 7980 
GTAAAATGGA AGACGCCAAA AACATAAACA AAGGCCGGGC GGGATTCTAT GGXCTAGAGC 8040 
ATGGAAGCGC TGGAGAGCAA CTGCATAAGG CTATGAAGAG ATACGCCCTG GTTCCTGGAA 8100 
CAATTGCTTT TACAGATGCA CATATCGAGG TGAACATCAC GTAGGCGGAA TACTTCGAAA 
TGTCCGTTCG GTTGGCAGAA GCTATGAAAC GATATGGGCT GAATACAAAT CACAGAATCG 
TCGTATGGAG TGAAAAGTCT CTTGAATTGT TTATGCCGGT GTTGGGCGCG TTATTTATCG 
GAGTTGCAGT TGCGCCCGCG AACGACATTT ATAATGAACG TGAATTGCTC AACAGTATGA 
ACATTTCGCA GCCTACCGTA GTGTTTGTTT GCAAAAAGGG GTTGCAAAAA ATTTTGAACG 
TGCAAAAAAA ATTACCAATA ATCCAGAAAA TTATTATGAT GGATTCXAAA ACGGATTACC 
AGGGATTTCA GTCGATGTAG ACGTTCGTGA CATGTCATCT ACCTCGCGGT TTTAATGAAT 
ACGATTTTGT ACCAGAGTCC TTTGATCGTG ACAAAACAAT TGCACTGATA ATGAATTCCT 
CTGGATCTAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA TAGAACTGCC TGCGTCAGAT 
TCTCGCATGC CAGAGATCCT ATTTTTGGCA ATCAAATCAT TCCGGATACT GCGATTTTAA 
GTGTTGTTCC ATTCCATCAC GGTTTTGGAA TGTTTACTAC ACTGGGATAT TTGATATGTG 
GATTTCGAGT CCTCTTAATG TATAGATTTG AAGAAGAGCT GTTTTTACGA TCCCTTCAGG 8820 
ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT TTCATTCTTC GGGAAAAGGA 8880 
CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT TGCTTCTGGG GGCGCACCTC 8940 
TTTCGAAAGA AGTGGGGGAA GCGGTTGCAA AACGCTTGCA TGTTCCAGGG ATACGAGAAG 9000 
GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC ACGGGAGGGG GATGATAAAG 9060 
CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGGGAA GGTTGTGGAT CTGGATACCG _ 9120 
GGAAAACGGT GGGCGTTAAT CAGAGAGGCG AATTATGTGT CAGAGGACCT ATGATTATGT 9180 
CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT TGACAAGGAT GGATGGCTAC 9240 
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ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAA CACTI CTTCATAGTT GACCGCTTGA 9300 
AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC TGAATTGGM TCGATATTCT 9360 
TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT TCCCGACGAT GACGCCGGTG 9420 
AACTTCCCGC CGCCCTTGTT GTTTTGGAGC ACGGAAAGAC GATGACGGAA AAAGAGATCG 9480 
TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT GCGCGGAGGA GTTGTGTTTG 9540- 
TGCACGAACT ACCGAAAGGT CTTACCGGAA AACTCGACGC AAGAAAAATC AGAGAGATCC 9600 
TCATAAAGCC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA TGTAACTGTA TTCAGCGATG 9660 
ACGAAATTCT. TAGCTATTGT AATGACTCTA GAGGATCTTT GTGAAGGAAC CTTACTTCTG .9720 
TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA AGCTCTAAGG TAAATATAAA 9780 
ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT GTTTCTGTAT TTTAGATTCC 9840 
AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGCAATGCC TTTAATGAGG AAAACCTGTT 9900 
TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT GCTGACTCTC AACATTCTAC 9960 
TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC TTTCCTTCAG AATTGCTAAG 10020 
TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT TGCTTTGCTA TTTACACCAC 10080 
AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA AAATATTCTG TAACCTTTAT 10140 
AAGTAGGCAT AACAGTTATA ATCATAACAT ACTGTTTTTT CTTACTCCAC ACAGCCATAG 10200 
AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC TTTAGCTTTT TAATTTGTAA 10260 
AGCGCTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT AGAGATCATA ATCAGCCATA 10320 
CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC ACACCTCCCC CTGAACCTGA 10380 
AACATAAAAT GAATGCAATT GTTCTTGTTA ACTTGTTTAT TGCAGCTTAT AATGGTTACA 10440 
AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT TTTTTCACTG CATTCTAGTT 10500 
GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG GATCCCCAGG AAGCTCCTCT 10560 
GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA TTCCAATCAT AGGCTGCCCA 10620 ■ 
TCCACCCTCT GTGTCCTCCT- GTTAATTAGG TCACTTAACA AAAAGGAAAT TGGGTAGGGG 10680 
TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC TGGGAAGTCC CTTCCACTGC 10740 
TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA CAGCAGAAAC ATACAAGCTG 10800 
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TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA GCACTGTGGT TGCTGTGTTA 10860 

GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT AGGTTCCAAA ATATCTAGTG 10920 

TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG GATAAGCATT ATCCTTATCC 10980 

AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA CTGTAGCATT TTTTGGGGTT 11040 

ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA CACCCTGCAG CTCCAAAGGT 11100 

TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA TGGGTTTTCC AGCACCATTT 11160 

TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG CAGTTACCCC AATAACCTCA 11220 

GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC AGGTTAAGTC CTCATTTAAA 11280 

TTAGGCAAAG GAA 11293 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10697 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS :. double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG ' 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT CCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

• AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 
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CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 
TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 
TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 
CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 
ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 
GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 
TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 
TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 
AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 
AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 
GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 
CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 
CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 
TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 
TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 
TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 
TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 
CTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 
XAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
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TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 
ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATCAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AAGATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

'AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGC TTTTTT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAC ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 
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TGGCTTTCTT GCCGCCAAGG 
GATGAGGATC GTTTCGCATG 
GGGTGGAGAG GCTATTCGGC 
CCGTGTTCCG GCTGTCAGCG 
GTGCCCTGAA TGAACTGCAG 
TTCCTTGCGC AGCTGTGCTC 
GCGAAGTGCC GGGGCAGGAT 
TCATGGCTGA TGCAATGCGG 
ACCAAGCGAA ACATCGCATC 
AGGATGATCT GGACGAAGAG 
AGGCGCGCAT GCCCQACGGC 
ATATCATGGT GGAAAATGGC 
CGGACCGCTA TCAGGACATA 
AATGGGCTGA CCGCTTCCTC 

» 

CCTTCTATCG CCTTCTTGAC 
CCAAGCGACG CCCAACCTGC 
GTTGGGCTTC GGAATCGTTT 
CATGCTGGAG TTCTTCGCCC 
CCTGAGGCTG GACGACCTCG 
AACCAGCAGC GGCTATCCGC 
GGCCGCTTTG GTCCCGGATC 
ACAAACTACC TACAGAGATT 
GTGTTAAACT ACTGATTCTA 
AATGGGAGCA GTGGTGGAAT 
CATCTAGTGA TGATGAGGCT 
GAAAGGTAGA AGACCCCAAG 
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ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 
ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 
TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 
CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 
GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 
GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 
CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 
CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 
GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 
CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 
GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 
CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 
GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 
GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 
GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 
CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 
TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 
ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 
CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 
GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 
TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 
TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 
ATTGTTTGTG TATTTTAGAT TCCAACCTAT GCAACTGATG 4980 
GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 
ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 
GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG - 5160 
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TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTCTCT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 
GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC . 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 
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CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 
TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 
TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 
TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 
CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 
CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCG 
CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA CCACCCCACC CAGCACACCT 
CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC. TCAGGGGCAC AGAGAGAGTC 
TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC CGGGCACCCA CATCTGGTAT 
AAAAGGAGGG AGTGGCGCAC AGAGGAGCAC AGCTGTGTTT GGCTGCAGGG CCAAGAGCGC 
TGTCAAGAAG ACCCACACGC CCCCCTCCAG CAGCTGAATT CCAGCTGGCA TTCCGGTACT 
GTTGGTAAAA TGGAAGACGC CAAAAACATA AAGAAAGGCC CGGCGCCATT CTATCCTCTA 
GAGGATGGAA CCGCTGGAGA GCAACTGCAT AAGGCTATGA AGAGATACGC CCTGGTTCCT 
, GGAACAATTG CTTTTACAGA TGCACATATC GAGGTGAACA TCACGTACGC GGAATACTTC 
GAAATGTCCG TTCGGTTGGC AGAAGCTATG AAACGATATG GGCTGAATAC AAATCACAGA 
ATCGTCGTAT GCAGTGAAAA CTCTCTTCAA TTCTTTATGC CGGTGTTGGC CGCGTTATTT 
ATCGGAGTTG CAGTTGCGCC CGCGAACGAC ATTTATAATG AACGTGAATT GCTCAACAGT 
ATGAACATTT CGCAGCCTAC CCTACTGTTT GTTTCCAAAA AGGGGTTGCA AAAAATTTTG 
AACGTGCAAA AAAAATTACC AATAATCCAG AAAATTATTA TCATGGATTC TAAAACGGAT 
TACCAGGGAT TTCAGTCGAT GTACACGTTC GTCACATCTC ATCTACCTCC CGGTTTTAAT 
GAATACGATT TTGTACCAGA GTCCTTTGAT CGTGACAAAA CAATTGCACT GATAATGAAT 
TCCTCTGGAT CTACTGGGTT ACCTAAGGGT GTGGCCCTTC CGCATAGAAC TGCCTGCGTC 
AGATTCTCGC ATGCCAGAGA TCCTATTTTT GGCAATCAAA TCATTCCGGA TACTGCGATT 
TTAAGTGTTG TTCCATTCCA TCACGGTTTT GGAATGTTTA CTACACTCCC ATATTTGATA 
TGTGGATTTC GAGTCGTCTT AATGTATAGA TTTGAAGAAG AGCTGTTTTT ACGATCCCTT 
CAGGATTACA AAATTCAAAG TGCGTTGCTA GTACCAACCC TATTTTCATT CTTCGCCAAA 
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AGCACTCTGA TTGACAAATA CGATTTATCT AATTTACACG AAATTGCTTC TGGGGGCGCA 8340 
CCTCTTTCGA AAGAAGTCGG GGAAGCGGTT GCAAAACCCT TCCATCTTCC AGGGATACGA 8400 
CAAGGATATG GGCTCACTGA GACTACATCA GCTATTCTGA TTACACCCGA GGGGGATGAT 
AAACCGGGCG CGGTCGGTAA AGTTGTTCCA TTTTTTGAAG CGAAGGTTGT GGATCTGGAT 
ACCGGGAAAA CGCTGGGCGT TAATCAGAGA GGCGAATTAT GTGTCAGAGG ACCTATGATT 
ATCTCCGGTT ATGTAAACAA TCCGGAAGCG ACCAACGCCT TGATTGACAA GGATGGATGG 
CTACATTCTG GAGACATAGC TTACTGGGAC GAAGACGAAC ACTTCTTCAT AGTTGACCGC 
TTGAAGTCTT TAATTAAATA CAAAGGATAT CAGGTGGCCC CCGCTGAATT GGAATCGATA 
TTGTTACAAC ACCCCAACAT CTTCGACGCG GGCGTGGCAG GTCTTCCCGA CGATGACGCC 
GGTGAACTTC CCGCCGCCGT TGTTGTTTTG GAGCACGGAA AGACGATGAC GGAAAAACAG 
ATCGTGGATT ACGTCGCCAG TCAAGTAACA ACCGGGAAAA AGTTGCGCGG AGGAGTTGTG 
T1TGTGGACG AAGTACCGAA AGGTCTTACC GGAAAACTCG ACGCAAGAAA AATCAGAGAG 
ATCCTCATAA AGGCCAAGAA GGGCGGAAAG TCCAAATTGT AAAATGTAAC TGTATTCAGC 
•GATGACGAAA TTCTTAGCTA TTGTAATGAC TCTAGAGGAT CTTTGTGAAG GAACCTTACT 
TCTGTGGTGT GACATAATTG GACAAACTAC CTACAGAGAT TTAAAGCTCT AAGGTAAATA 91 80 
TAAAATTTTT AAGTGTATAA TGTGTTAAAC TACTGATTCT AATTGTTTGT GTATTTTAGA 9240 
TTCCAACCTA TGGAACTGAT GAATGGGAGC AGTGGTGGAA TGCCTTTAAT GAGGAAAACC 
TGTTTTGCTC AGAAGAAATG CCATCTAGTG ATGATGAGGC TACTGCTGAC TCTCAACATT 
CTACTCCTCC AAAAAAGAAG AGAAAGGTAG AAGACCCCAA GGACTTTCCT TCAGAATTGC 
TAAGTTTTTT GACTCATGCT GTGTTTAGTA ATAGAACTCT TGCTTGCTTT GCTATTTACA 
CCACAAAGGA AAAAGCTGCA CTGCTATACA AGAAAATTAT GGAAAAATAT TCTGTAACCT 
TTATAACTAG GCATAACACT TATAATCATA ACATACTGTT TTTTCTTACT CCACACAGGC 
ATAGAGTGTC TGCTATTAAT AACTATGCTC AAAAATTGTG TACCTTTAGC TTTTTAATTT 9660 
CTAAAGCGGT TAATAAGGaA TATTTGATGT ATAGTGCCTT GACTAGAGAT CATAATCAGC 9720 
CATACCACAT TTGTAGAGGT TTTACTTGCT TTAAAAAACC TCCCACACCT CCCCCTGAAG 9780 
CTGAAACATA AAATGAATGC AATTGTTGTT GTTAACTTGT TTATTGCAGC TTATAATGGT 9840 
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TACAAATAAA GCAATAGCAT CACAAATTTC ACAAATAAAG CATTTTTTTC ACTGCATTCT 9900 
AGTTGTGGTT TGTCCAAACT CATCAATGTA TCTTATCATG TCTGGATCCC CAGGAAGCTC 9960 

CTCTGTGTCC TCATAAACCC TAACCTCCTC TACTTGAGAG GACATTCCAA TCATAGGCTG 10020 

CCCATCCACC CTCTGTGTCC TCCTGTTAAT TAGGTCACTT AACAAAAAGG AAATTGGGTA" 10080 

GGGGTTTTTC ACAGACCGCT TTCTAAGGGT AATTTTAAAA TATCTGGGAA GTCCCTTCCA 10140 

CTGCTGTGTT CCAGAAGTGT TGGTAAACAG CCCACAAATG TCAACAGCAG AAACATACAA 10200 

GCTGTCAGCT TTGCACAAGG GCCCAACACC CTGCTCAGCA AGAAGCACTG TGGTTGCTGT 10260 

GTTAGTAATG TGCAAAACAG GAGGCACATT TTCCCCACCT GTGTAGGTTC CAAAATATCT 10320 

AGTGTTTTCA TTTTTACTTG GATCAGGAAC CCAGCACTCC ACTGGATAAG CATTATCCTT 10380 

ATCCAAAACA GCCTTGTGGT CAGTGTTCAT CTGCTGACTG TCAACTGTAG CATTTTTTGG 10440 

GGTTACAGTT TGAGCAGGAT ATTTGGTCCT GTAGTTTGCT AACACACCCT GCAGCTCCAA 10500 

AGGTTCCCCA CCAACAGCAA AAAAATGAAA ATTTGACCCT TGAATGGGTT TTCCAGCACC 10560 

ATTTTCATGA GTTTTTTGTG TCCCTGAATG CAAGTTTAAC ATAGCAGTTA CCCCAATAAC 10620 

CTCAGTTTTA ACAGTAACAG CTTCCCACAT CAAAATATTT CCACAGGTTA AGTCCTCATT 10680 

TAAATTAGGC AAAGGAA 10697 
(2) INFORMATION FOR SEQ ID NO: 3:, 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10549 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

. (ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT 



TAGGTTAATG TCATGATAAT 



GTGCGCGGAA CCCCTATTTG 
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TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

CCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG AXCTCAACAG 360 - 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGCT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCCTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

CGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGC AGAAAGGCGG ACAGGTATCC 1680 
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GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTO TAATTGTTTG TGTATTTTAp. ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TCTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 
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GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG ACGC TT TT TT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AXGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

• TCATGGCTGA TGCAATGCGG CGG CTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 
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GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 
ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 
GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 
AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 
CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAG A 
GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 
TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 
TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 
ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 
ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 
ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 

"ttacttgctt TAAAAAACGT ggcacacctc cccctgaacc tgaaacataa AATGAATGGA 

ATTGTTCTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 
. ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 
ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 
AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 
CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 
TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 
GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 
CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 
AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 
ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTCTGCTC 
AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 
TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 
AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 
CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 
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TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTACTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTAXAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAAGTTCATC TATTTCCTCC CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 7140 

ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 7200 

AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 7260 

TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 7320 

ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT TGCTTTTACA GATGCACATA 7380 

TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 7440 

TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 7500 

AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 7560 

ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 7620 

TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 7680 

AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 7740 

TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 7800 

ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 7860 

GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 7920 
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TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 
TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 
GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 
TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT GATTGACAAA TACGATTTAT 
CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGG^GCGG 
TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 
CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 
CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 
GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG TTATGTAAAC AATCCGGAAG 
CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC TGGAGACATA GCTTACTGGG 
ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 
ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 
CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGGC GTTGTTGTTT 
TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 
CAACCGCGAA AAAGTTGCGC GCf GGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 
CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 
ACTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 
ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 
ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 
ACTACTGATT CTAATTCTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 
GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 
TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 
AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 
TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 
CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 
XAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 



7980 
8040 
6100 
8160 
8220 
8280 
8340 
8400 
• 8460 
8520 
8580 
8640' 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 



WO 95/19987 

POYUS95/01153 

-111- 

TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGCG GTTAATAAGG AATATTTGAT 9540 

GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 9600 

CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 9660 

TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 9720 

TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 9780 

TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT CCTCATAAAC CCTAACCTCC 9840 

TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA CCCTCTGTGT CCTCCTGTTA 9900 

ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT TCACAGACCG CTTTCTAAGG 9960 

GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG TTCCAGAAGT GTTGGTAAAC 10020 

AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG CTTTGCACAA GGGCCCAACA 10080 

CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA TGTGCAAAAC AGGAGGCACA 10140 

TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT CATTTTTACT TGGATCAGGA 10200 

ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA CAGCCTTGTG CTCACTGTTC 10260 

ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG TTTGAGCAGG ATATTTGGTC 10320 

CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC CACCAACAGC AAAAAAATGA 10380 

AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT GAGTTTTTTG TGTCCCTGAA 10440 

TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT TAACAGTAAC AGCTTCCCAC 10500 

ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG GCAAAGGAA 10549 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10558 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 
TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC "cCTGATAAAT 180 
GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 
TCCCTTTTTT GGGGGATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC- TGGTGAAAGT 300 
AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 
CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 
AGTTCTGCTA TGTGGCGGGG TATTATCGCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 
CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 
TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 
TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 
CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 
ACCAAACGAC GAGCGTGACA CCACGATCCC TGCAGCAATG GCAACAACGT TGCGCAAACT 
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 
GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 
TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CCGTATCATT GCAGCACTGG GGCCAGATGG 
TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 
AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 
AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 
GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 
CTGAGGGTCA GACCCCCTAC AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 
CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 
TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 
TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1^0 
TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG " 1500 
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TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATC 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG . TTATCCCCTG ATTCTGTGGA 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 
ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 
TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 
TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 
ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 
CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 
CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 
TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 
AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 
GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 
CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 
TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 
TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 
AAAATGAATG. CAATTGTTGT TGTTAACTTG . TTTATTGCAG CTTATAATGG TTACAAATAA 
AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 
TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 
TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 
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AATTACTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATACCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

CTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGCT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTCGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 
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CATGCTGGAG 
CCTGAGGCTG 
AACCAGCAGC 
GGCCGCTTTG 
ACAAACTACC 
. GTGTTAAACT 
AATGGGAGCA 
CATCTAGTGA 
GAAAGGTAGA 
TGTTTAGTAA 
TGCTATACAA 
ATAATCATAA 
ACTATGCTCA 
ATTTGATGTA 
TTACTTGCTT 
ATTGTTGTTG 
ACAAATTTCA 
ATCAATGTAT 
AACCTCCTCT 
CCTGTTAATT 
TCTAAGGGTA 
GGTAAACACC 
CCCAACACCC 
AGGCACATTT 
ATCAGGAACC 
AGTGTTCATC 



TTCTTCGCCC 
GACGACCTCG 
GGCTATCCGC 
GTCCCGGATC 
TACAGAGATT 
ACTGATTCTA 
GTGGTGGAAT 
TGATGAGGCT 
AGACCCCAAG 
TAGAACTCTT 
GAAAATTATG 
CATACTGTTT 
AAAATTGTGT 
TAGTGCCTTG 
TAAAAAACCT 
TTAACTTGTT 
CAAATAAAGC 
CTTATCATGT 
ACTTGAGAGG 
AGGTCACTTA 
ATTTTAAAAT 
CCACAAATGT 
TGCTCATCAA 
TCCCCACCTG 
CAGCACTCCA 
TGCTGACTGT 



ACCCCGGGCT 
CGGAGTTCTA 
GCATCCATGC 
TTTGTGAAGG 
TAAAGCTCTA 
ATTGTTTGTG 
GCCTTTAATG 
ACTGCTGACT 
GACTTTCCTT 
GCTTGCTTTG 
GAAAAATATT 
TTTCTTACTC 
ACCTTTAGCT 
ACTAGAGATC 
CCCACACCTC 
TATTGCAGCT 
ATTTTTTTCA 
CTGGATCCCC 
ACATTCCAAT 
ACAAAAAGGA 
ATCTGGGAAG 
CAACAGCAGA 
GAACCACTCT 
TGTAGGTTCC 
CTGGATAAGC 
CAACTGTAGC 



-US- » 

CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

AATTGGGTAC GGGTTTTTCA CAGACCGCTT 5820 

TCCCTTCCAC TGCTCTGTTC CAGAAGTGTT 5880 

AACATACAAG CTGTCAGCTT TGCACAAGGC 5.940- 

GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 ' 

AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 
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TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGTGGGGAG TCAGCCGTGT ATCATCGCCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140 

C AG AG GAG C A CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200 

CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260 

CCAAAAACAT AAAGAAAGGC CCGGCCCCAT TCTATCCTCT AGAGGATGGA AC CGCTGGAG 7320 

AGCAACTGCA TAAGCCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG 7380 

ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 7440 

CAGAAGCTAT GAAACGATAT GGGCTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 7500 

ACTCTCTTCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 7560 

CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT . TCGCAGCCTA 7620 

CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTCCAA AAAAAATTAC 7680 

CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 7740 
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TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 7800 

AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 7860 

TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 7920 

ATCCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 7980 

ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATCTGGATTT CGAGTCGTCT 8040 

TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 8100 

GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 8160 

ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 8220 

GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 8280 

AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 8340 

AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 8400 

TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 8460 

ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 8520 

CTTACTGGGA CGAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 8580 

ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 8640 

TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 8700 

TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA . 8760 

GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 8820 

AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 8880 

AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940 

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG TGACATAATT 9000 

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060 

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT ATGGAACTGA 9120 

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180 

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240 

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT TGAGTCATGC 9300 
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TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360 

. ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GGCATAACAG 9420 

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480 

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540 

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600 

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660 

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720 

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCCAAAC 9780 

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840 

CTAACCTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900 

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960 

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020 

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080 

' GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140 

JGGAGGCACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200 

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260 

TCAGTGTTCA TCTGCTGACT GTCAACTGTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320 

TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA 10380 

AAAAAATGAA AATTTGACCC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440 

GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500 

GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10569 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 



(iii 

(i* ! 

TTCTTG^ ; 

aatggt: 

TTTATT s 
GCTTCAj j 
TCCCTT j 
AAAAGA j 
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CGGTAA j 
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AGTTCT ■! 

CCGCA1 ;l 
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TACGG/. 1 
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GGTGA | 
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CTGAG :i 



(ii) MOLECULE TYPE: DNA (genomic) 



CG1PJ 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA CAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCCCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAACCCCTCC CCTATCGTAG TTATCTACAC GACGCGGACT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 
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TGAAGACCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

. TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGGCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT . 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GG CAT AACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 
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AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

GGAGG CCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT. 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACCGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTG CATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCCTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4 380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 
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CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GG AATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACAGAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520. 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 
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AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CACTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTCGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

,TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CACTCCAACC TCAGCCAGAC AAGGTTGTTG ACACAAGACC CACATCTGGT ATAAAAGGAG 7140 

GCAGTGGCCC ACAGAGGAGC ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA 7200 

AGACCCACAC GCCCCCCTCC AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA 7260 

AATGGAAGAC GCCAAAAACA TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG 7320 

AACCGCTGGA GAGCAACTGC ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT 7380 

TGCTTTTACA GATGCACATA TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC 7440 

CGTTCGGTTG GCAGAAGCTA TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT 7500 

ATGCAGTGAA AACTCTCTTC AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT 7560 
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TGCAGTTGCG CCCGCGAACG ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT 
TtCGCAGCCT ACCGTAGTGT TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA 
AAAAAAATTA CCAATAATCC AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG 
ATTTCAGTCG ATGTACACGT TCGTCACATC TCATCTACCT CCCGGTTTTA~ATGAATACGA 
TTTTGTACCA GAGTCCTTTG ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG 
ATCTACTGGG TTACCTAAGG GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC 
GCATGCCAGA GATCCTATTT TTGGCAATCA . AATCATTCCG GATACTGCGA TTTTAAGTGT 
TGTTCCATTC CATCACGGTT TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT 
TCGAGTCGTC TTAATGTATA GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA 
CAAAATTCAA AGTGCGTTGC TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT 
GATTGACAAA TACGATTTAT CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC 
GAAAGAAGTC GGGGAAGCGG TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA 
TGGGCTCACT GAGACTACAT CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG 
CGCGGTCGGT AAAGTTGTTC CATTTTTTGA AGCGAAGGTT GTCGATCTGG ATACCGGGAA 
AACGCTGGGC GTTAATCAGA GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG 
TTATGTAAAC AATCCGGAAG CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC 
TGCAGACATA GCTTACTGGG ACGAAGACGA ACACTTCTTC ATACTTGACC GCTTGAAGTC 
TTTAATTAAA TACAAAGGAT ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA 
ACACCCCAAC ATCTTCGACG CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT 
TCCCGCCGCC GTTGTTGTTT TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA 
TTACGTCGCC AGTCAAGTAA CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA 
CGAAGTACCG AAAGGTCTTA CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT 
AAAGGCCAAG AAGGGCGGAA AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA 
AATTCTTAGC TATTGTAATG ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT 
GTGACATAAT TGGACAAACT ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT 
TTAAGTGTAT AATGTGTTAA ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC 
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TATGGAACTG ATGAATGGGA GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC 9180 

TCAGAAGAAA TGCCATCTAG TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT 9240 

CCAAAAAAGA AGAGAAAGGT AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT 9300 

TTGAGTCATG CTGTGTTTAG TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG 9360 

GAAAAAGCTG CACTGCTATA CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT 9420 

AGGCATAACA GTTATAAtCA TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG 9480 

TCTGCTATTA ATAACTATGC TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG 9540 

GTTAATAAGG AATATTTGAT GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC 96O0 

ATTTGTAGAG GTTTTACTTG CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA 9660 

TAAAATGAAt GCAATTGTTG TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA 9720 

AAGCAATAGC ATCACAAATT TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG 9780 

TTTGTCCAAA CTCATCAATG TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT 9840 

CCTCATAAAC CCTAACCTCC TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA 9900 

CCCTCTGTGT CCTCCTGTTA ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGCTTTT 9960 

TCACAGACCG CTTTCTAAGG GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG 10020 

TTCCAGAAGT GTTGGTAAAC AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG 10080 

CTTTGCACAA GGGCCCAACA CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA 10140 

TGTGCAAAAC AGGAGGCACA TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT 10200 

CATTTTTACT TGGATCAGGA ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA 10260 

CAGCCTTGTG GTCAGTGTTC ATCTGOTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG 10320 

TTTGAGCAGG ATATTTGGTC CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC 10380 

CACCAACAGC AAAAAAATGA AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT 10440 

GAGTTTTTTG TGTCCCTGAA TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT 10500 

TAACAGTAAC AGCTTCCCAC ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG 10560 

GCAAAGGAA 1Q569 
(2) INFORMATION FOR SEQ ID NO: 6: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10558 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

* 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6; 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 
TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 
GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 
TCCCTTTTTT GCGGCATTTT GCCTTCCTCT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 
• AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 
CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 
AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 
CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 
TACGGATGGC ATGACACTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 
TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 
CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 
ACCAAACGAC GACCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 
GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 

TAAATCTGGA gccggtgagc gtgggtctcg cggtatcatt gcagcactgg GGCCAGATGG 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 
AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 
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AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA • 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCXA CGAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCC TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCACGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 ' 

CTCCTCACGC GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT I860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCCCCTTTG ACTGAGCTGA TACCCCTCGC CGCAGCCGAA CGACCGACCG 1980 

CAGCGACTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACCCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CACTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

. TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTCTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGCTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGACTGT 2640 
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CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCtCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG A'GGCTTT T TT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG • ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG .3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

• TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CAT CAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 
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AGGCGCGCAT 


GCCCGACGGC 


GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 


4260 


ATATCATGGT 


GGAAAATGGC 


CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 


4320 


CGGACCGCTA 


TCAGGACATA 


GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 


4380 


AATGGGCTGA 


CCGCTTCCTC 


GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 


4440 


CCTTCTATCG 


CCTTCTTGAC 


GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 


4500 


CCAAGCGACG 


CCCAACCTGC 


CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 


4560 


GTTGGGCTTC 


GGAATCGTTT 


TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 


4620 


CATGCTGGAG 


TTCTTCGCCC 


ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 


4680 


CCTGAGGCTG 


GACGACCTCG 


CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 


4740 


AACCAGCAGC 


GGCTATCCGC 


GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 


4800 


GGCCGCTTTG 


GTCCCGGATC 


TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 


4860 


ACAAACTACC 


TACAGAGATT 


TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 


4920 


GTGTTAAACT 


ACTGATTCTA 


ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 


4980 


AATGGGAGCA 


GTGGTGGAAT 


GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 


5040 


CATCTAGTGA 


TGATGAGGCT 


ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 


5100 


GAAAGGTAGA 


AGACCCCAAG 


GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 


5160 


TGTTTAGTAA 


TAGAACTCTT 


GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 


5220 


TGCTATACAA 


GAAAATTATG 


GAAAAATATT . CTGTAACCTT TATAAGTAGG CATAACAGTT 


5280 


ATAATCATAA 


CATACTGTTT 


TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 


5340 


ACTATGCTCA 


AAAATTGTGT 


AC CTTT AG CT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 


5400 


ATTTGATGTA 


TAGTGCCTTG 


ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 


5460 


TTACTTGCTT 


TAAAAAACCT 


CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 


5520 


ATTGTTGTTG 


TTAACTTGTT 


TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 


5580 


ACAAATTTCA 


CAAATAAAGC 


ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACXC 


5640 


ATCAATGTAT 


CTTATCATGT 


CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 


5700 


AACCTCCTCT 


ACTTGAGAGG 


ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 


5760 
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CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 60O0 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG. CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT . AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGCGAGACA AGGTTGTTGA CACAAGAGCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140 

• CAGAGGAGCA CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200 

CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260 

CCAAAAACAT AAAGAAAGGC CCGGCGCCAT TCTATCCTCT AGAGGATGGA ACCGCTGGAG 7320 
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AGCAACTGCA TAAGGCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG 7380 

ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 7440 

CAGAAGCTAT GAAACGATAT GGGCTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 7500 

ACTCTCTtCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 7560 

CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT TCGCAGCCTA 7620 

CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTGCAA AAAAAATTAC 7680 

CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 7740 

TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 7800 

AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 7860 

TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 7920 

ATGCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 7980 

ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATGTGGATTT CGAGTCGTCT 8040 

TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 8100 

GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 8160 

ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 8220 

GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 8280 

AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 8340 

AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 8400 

TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 8460 

ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 8520 

CTTACTGGGA CCAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 8580 

ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 8640 

TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 8700 

TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA 8760. 

GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 8820 

AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 8880 
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AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940 

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAG ■ TTCTGTGGTG TGACATAATT 9000 

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060 

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG "ATTCCAACCT ATGGAACTGA 9120 

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180 

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240 

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT TGAGTCATGC 9300 

TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360 

ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GG CAT AACAG 9420 

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480 

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540 

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600 

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660 

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720 

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCGAAAC 9780 

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840 

CTAACGTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900 

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960 

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020 

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080 

GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140 

GGAGGCACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200 

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260 

TCAGTGTTCA TCTGCTGACT GTCAACTCTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320 

TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA 10380 

AAAAAATGAA AATTTGACCC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440 
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GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500 
GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6245 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA . (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 



(xi) SEQUENCE. DESCRIPTION: SEQ ID NO: 7: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

.TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

- CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCT<JCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 
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TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA "AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA i200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTT T T CTCCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACe GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTCTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 
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ACAAATAAAG CA TTTTTTT C ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520- 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA AT AG CATC AC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAAG TTCATCTATT TCCTCCCACA TCTGGTATAA AAGGAGGCAG 2820 

TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC AAGAGCGCTG TCAAGAAGAC 2880 

CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT CCGGTACTGT TGGTAAAATG 2940 

GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT ATCCTCTAGA GGATGGAACC 3000 

GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC TGGTTCCTGG AACAATTGCT 3060 

TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG AATACTTCGA AATGTCCGTT 3120 

CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA ATCACAGAAT CGTCGTATGC 3180 

AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG CGTTATTTAT CGGAGTTGCA 3240 

GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC TCAACAGTAT GAACATTTCG 3300 

CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA AAATTTTGAA CGTGCAAAAA 3360 

AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA AAACGGATTA CCAGGGATTT 3420 

CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG GTTTTAATGA ATACGATTTT 3480 

GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA TAATGAATTC CTCTGGATCT 3540 

ACTGGGTTAC CTAAGGGTGT GCCCCTTCCG CATAGAACTG CCTGCGTCAG ATTCTCGCAT 3600 

GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA CTGCGATTTT AAGTGTTGTT 3660 

CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT ATTTGATATG TGGATTTCGA 3720 

GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC GATCCCTTCA GGATTACAAA 3780 

ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT TCGCCAAAAG CACTCTGATT 3840 

GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG GGGGCGCACC TCTTTCGAAA 3900 

GAACTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG GGATACGACA AGGATATGGG 3960 

CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG GGGATGATAA ACCGGGCGCG 4020 
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GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG ATCTGGATAC CGGGAAAACG 
CTGGGCGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC CTATGATTAT GTCCGGTTAT 
GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG ATGGATGGCT ACATTCTGGA 
GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG TTGACCGCTT GAAGTCTTTA 
ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG AATCGATATT GTTACAACAC 
CCCAACATCT TCGACGCGGG CGTGGCAGGT CTTCCCGACG ATGACGCCGG TGAACTTCCC 
GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG AAAAAGAGAT CGTGGATTAC 
GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG GAGTTGTGTT TGTGGACGAA 
GTACCGAAAG GTCTTAC CGG AAAACTCGAC GCAAGAAAAA TCAGAGAGAT CCTCATAAAG 
GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG TATTCAGCGA TGACGAAATT 
CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA ACCTTACTTC TGTGGTGTGA 
GATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA GGTAAATATA AAATTTTTAA 
GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT ATTTTAGATT CCAACCTATG 
GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA GGAAAACCTG TTTTGCTCAG 
AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC TCAACATTCT ACTCCTCCAA 
AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC AGAATTGCTA AGTTTTTTGA 
GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC TATTTACACC ACAAAGGAAA 
AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC TGTAACCTTT ATAAGTAGGC 
ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC ACACAGGCAT AGAGTGTCTG 
CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT TTTAATTTGT AAAGGGGTTA 
ATAAGGAATA TTTGATGTAT AGTCCCTTGA CTAGAGATCA TAATCAGCCA TACCACATTT 
GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC CCCTGAACCT GAAACATAAA 
ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC 
AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 
TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA GGAAGCTCCT CTGTGTCCTC 
ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC ATAGGCTGCC CATCCACCCT 
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CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA ATTGGGTAGG CG TTTTT CAC 5640 

AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT CCCTTCCACT GCTGTGTTCC 5700 

AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA ACATACAAGC TGTCAGCTTT 5760 

GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG GTTGCTGTGT TAGTAATGTG 5820 

CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA AAATATCTAG TGTTTTCATT 5880 

TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA TTATCCTTAT CCAAAACAGC 5 9 AO 

CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA TTTTTTGGGG TTACAGTTTG 6000 

AGCAGGATAT TTGGTCCTGT AGTTTGCTAA CACACCCTGC AGCTCCAAAG GTTCCCCACC 6060 

AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT CCAGCACCAT TTTCATGAGT 6120 

TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC CCAATAACCT CAGTTTTAAC 6180 

AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG TGCTCATTTA AATTAGGCAA 6240 

AGGAA 6245 
(2) INFORMATION FOR SEQ ID NO: 8: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6254 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 
TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 
GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 
TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 
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AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTG AAGAAGTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 
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GCCCTTTTCC- TCGCCTTTTG CTCACATGTT CTTTCCTGCG ' TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACCCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

' TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAGT GGGGAGTCAG CCGTGTATCA TCGCCCACAT CTGGTATAAA 2820 

AGGAGGCAGT GGCCCACAGA GGAGCACAGC TGTGTTTGGC TGCAGGGCCA AGAGCGCTGT 2880 

CAAGAAGACC CACACGCCCC CCTCCAGCAG CTGAATTCCA GCTGGCATTC CGGTACTGTT 2940 

GGTAAAATGG AAGACGCCAA AAACATAAAG AAAGGCCCGG CGCCATTCTA TCCTCTAGAG 3000 

GATGGAACCG CTGGAGAGCA ACTGCATAAG GCTATGAAGA GATACGCCCT GGTTCCTGGA 3060 

ACAATTGCTT TTACAGATGC ACATATCGAG GTGAACATCA CGTACGCGGA ATACTTCGAA 3120 

ATGTCCCTTC GGTTGGCAGA AGCTATGAAA CGATATGGGC TGAATACAAA TCACAGAATC 3180 

GTCGTATGCA GTGAAAACTC TCTTCAATTC TTTATGCCGG TGTTGGGCGC GTTATTTATC 3240 

GGAGTTGCAG TTGCGCCCGC GAACGACATT TATAATGAAC GTGAATTGCT CAACAGTATG 3300 

AACATTTCGC AGCCTACCGT AGTGTTTGTT TCCAAAAAGG GGTTGCAAAA AATTTTGAAC 3360 

GTGCAAAAAA AATTACCAAT AATCCAGAAA ATTATTATCA TGGATTCTAA AACGGATTAC 3420 
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CAGGGATTTC AGTCGATGTA CACGTTCGTC ACATCTCATC TACCTCCCGG TTTTAATGAA 3480 

TACGATTTTG TACCAGAGTC CTTTGATCGT GACAAAACAA TTGCACTGAT AATGAATTCC 3540 

TCTGGATCTA CTGGGTTACC TAAGGGTGTG GCCCTTCCGC ATAGAACTGC CTGCGTCAGA 3600 

TTCTCGCATG CCAGAGATCC TATTTTTGGC AATCAAATCA TTCCGGATAC TGCGATTTTA 3660 

AGTGTTGTTC CATTCCATCA CGGTTTTGGA ATGTTTACTA CACTCGGATA TTTGATATGT 3720 

GGATTTCGAG TCGTCTTAAT GTATAGATTT GAAGAAGAGC TGTTTTTACG ATCCCTTCAG 3780 

GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 3840 

ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGG CGCACCT 3900 

CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 3960 

GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CACCCGAGGG GGATGATAAA 4020 

CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 4080 

GGGAAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGAGGACC TATGATTATG 4140 

TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 4200 

CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 4260 

AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 4320 

TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 4380 

GAACTTCCCG CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 4440 

GTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 4500 

GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT CAGAGAGATC 4560 

CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 4620 

GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 4680 

GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG CTAAATATAA 4740 

AATTTTTAAC TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 4800 

CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGGAATGC CTTTAATGAG GAAAACCTGT 4860 
TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCTAC TGCTGACTCT CAACATTCTA ' 4920 

CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 4980 
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GTTTTTT GAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 5040 
CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 5100 
TAAGTAGGCA TAACAGTTAT AATCATAACA TACTGTTTTT TCTTACTCCA CACAGGCATA 5160 
GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 
AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 
ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCC CCTGAACCTG 5340 
AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 5400 
AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 5460 
TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 5520 
TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 5580 
ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 5640 
GTTTTTCACA GACCGCTTTC TAAGGGTAAT TTTAAAATAT CTGGGAAGTC CCTTCCAGTG 
CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA ACAGCAGAAA CATACAAGCT 
. GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT 
AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 
GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 
CAAAACACCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTCTAGCAT TTTTTGGGGT 6000 
TACAGTTTGA GCAGGATATT TGGTCCTGTA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG 6060 
TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 6120 
. TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 6180 
AGTTTTAACA GTAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 
ATTAGGCAAA GGAA 

(2) INFORMATION FOR SEQ ID NO : 9 : . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 



5700 
5760 
5820 
5880 
5940 



6240 
6254 




WO 95/19987 

-142- 

<ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGG CAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

. CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 
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CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 13 80 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGGACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA . CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

' TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTCTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA. GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCACT CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGACCCACA 2820 
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TCTGGTATAA AAGGAGGCAG TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC 2880 

AAGAGCGCTG TCAAGAAGAC CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT 2940 

CCGGTACTGT TGGTAAAATG GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT 3000 

ATCCTCTAGA GGATGGAACC GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC 3060 

TGGTTCCTGG AACAATTGCT TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG 3120 

AATACTTCGA AATGTCCGTT CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA 3180 

ATCACAGAAT CGTCGTATGC AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG 3240 

CGTTATTTAT CGGAGTTGCA GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC 3300 

TCAACAGTAT GAACATTTCG CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA 3360 

AAATTTTGAA CGTGCAAAAA AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA 3420 

AAACGGATTA CCAGGGATTT CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG 3480 

GTTTTAATGA ATACGATTTT GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA 3540 

TAATGAATTC CTCTGGATCT ACTGGGTTAC CTAAGGGTGT GGCCCTTCCG CATAGAACTG 3600 

CCTGCGTCAG ATTCTCGCAT GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA 3660 

CTGCGATTTX AAGTGTTGTT CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT 3720 

ATTTGATATG TGGATTTCGA GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC 3780 

GATCCCTTCA GGATTACAAA ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT 3840 

TCGCCAAAAG CACTCTGATT GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG 3900 

GGGGCGCACC TCTTTCGAAA GAAGTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG 3960 

GGATACGACA AGGATATGGG CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG 4020 

GGGATGATAA ACCGGGCGCG GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG 4080 

ATCTGGATAC CGGGAAAACG CTGGGCGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC 4140 

CTATGATTAT GTCCGGTTAT GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG 4200 

ATGGATGGCT ACATTCTGGA GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG 4260 

TTGACCGCTT GAAGTCTTTA ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG 4320 

AATCGATATT GTTACAACAC CCCAACATCT TCGACGCGGG. CGTGGCAGGT CTTCCCGACG 4380 
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ATGACGCCGG TGAACTTCCC GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG 4440 

AAAAAGAGAT CGTGGATTAC GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG 4500 

GAGTTGTGTT TGTGGACGAA GTACCGAAAG GTCTTACCGG AAAACTCGAC GCAAGAAAAA 4560 

TCAGAGAGAT CCTCATAAAG GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG 4620 

TATTCAGCGA TGACGAAATT CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA 4680 

ACCTTACTTC TGTGGTGTGA CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA 4740 

GGTAAATATA AAATTTTTAA GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT 4800 

ATTTTAGATT CCAACCTATG GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA 4860 

GGAAAACCTG TTTTGCTCAG AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC 4920 

TCAACATTCT ACTCCTCCAA AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC 4980 

AGAATTGCTA AGTTTTTTGA GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC 5040 

TATTTACACC ACAAAGGAAA AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC 5100 

TGTAACCTTT ATAAGTAGGC ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC 5160 

' ACACAGGCAT AGAGTGTCTG CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT 5220 

TTTAATTTGT AAAGGGGTTA ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA 5280 

TAATCAGCCA TACCACATTT GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC 5340 

CCCTGAACCT GAAACATAAA ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT 5400 

ATAATGGTTA CAAATAAAGC AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC 5460 

TGCATTCTAG TTGTGGTTTG TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA 5520 

GGAAGCTCCT CTGTGTCCTC ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC 5580 

ATAGGCTGCC CATCCACCCT CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA 5640 

ATTGGGTAGG GGTTTTTCAC AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT 5700 

CCCTTCCACT GCTGTGTTCC AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA 5760 

ACATACAAGC TGTCAGCTTT GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG 5820 

GTTGCTGTGT TAGTAATGTG CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA 5880 

AAATATCTAG T GTTTT CATT TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA 5940 
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TTATCCTTAT CCAAAACAGC CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA 6000 

TTTTTTGGGG TTACAGTTTG AGCAGGATAT TTGGTCCTGT ACTTTCCTAA CACACCCTGC 6060 

AGCTCCAAAG GTTCCCCACC AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT 6120 

CCAGCACCAT TTTCATGAGT TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC 6180 

CCAATAACCT CAGTTTTAAC AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG 6240 

TCCTCATTTA AATTAGGCAA AGGAA 6265 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6254 base pairs. 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCCGCATTTT GCCTTCCTCT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGCTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTCCA 660 
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CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 
ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 
GGATAAAGTT GCAGGACCAC TT CTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 
TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 
TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 
AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 
AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 
GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 
CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 
CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 
TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 
TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 
TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATMGTCGTG 
TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 
GTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 

ATAGTTAAGC cagtatacac tccgctatcg ctacgtgact gggtcatggc tgcgccccga 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTCTC TGCTCCCGGC ATCCGCTTAC 
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^'^-C KTTO0! «» K TCTC 4M ™cc CTm 2280 
AAACGCGCGA GGCAGCCGAI CAIAAXCAGC CAXACCACAX TTGTAGACCT IXIaCIXGCX 234 „ 

xiaaaaaacc xgggaccgx gggggxgaag gxgaaagaxa aaaxgaaxgg aaxxgxxgxx 2m 

GTTAACTTGT TTATTCCACC TTATAATGGT XACAAAXAAA GGAAXAGGAX GAGAAAXXXG JMo 
AGAAAIaaaG CAIXXXXXXC ACTGCATTCT AGXXCIGGIX TGTCCAAACT CATCAATGTA 

-™gaxg tctcgatcat aaicacccax aggagattxg xagaggxttx acttccttta 

MAAAGG.GG GAGAGGXCGG CCTGAACCTG .AAACATAAAA XGAAXGGAAT TCTTCTTGTT 
AACTTGTTTA TtGGAGGTTA TAATGGTTAC AAATAAAGGA ATAGGAXGAG AAAXXXGAGA 
-MAGGAT ««, momor ccmacimi 

xaxcaxgicx ggatgggagg gagagaaggx xgxxgagaga AGAGGGAGAX ctggtaiaaa 

~GX GGGGGAGAGA GGaGGAGAGG XGXGXXXGGG XGGAGGGGGA AGAGGGCXGX 

gaagaagacc GAGAGGGGGG ggxcgaggag ctcaattcca CCTGGCATTC ccgtactgtt 

GGlAAAAIGG AAGAGGGGAA AAAGAXAAAG AAAGGGGGGG GGGGAXXGXA XGGXGXAGAG 
gatggaaggg GXGGAG.GGA actgcataag GGXAXGAAGA GAXAGGGGGX ggttcctgga 
ACAATTGCTT XXACAGAXGG AGAXAXGGAG GXGAAGAXGA GGXAGGGGGA AXAGXXGCAA 

° G ™ ~ ggatatgggc xgaaxagaaa xgagagaaxg 

GTCGIATGCA GTGAAAACTC XGXXGAAXXG TTTATGCCGC TGTTGGGCGC GTTATTTATC 

ggagttgcac xxgggggggg gaaggagaxx xaxaaxgaag gxgaaxxggx gaagagxaxg 

MGAXXXGGG AGCCTACCGI ACTCTTTCTT TCCAAAAAGC CGTTCC TTTTGAAC 

CTGGAAAAAA AAXXAGOAX aATGGAGAAA AXXAXXAXGA XGGAXXGXAA AAGGGATTAG 
CACGCATTTC AGXGGAXGXA GAGGXXGGXC ACATCTCATC TACCTCCCGG TTTTAATGAA 

-cgaxxxxg taccagactc ctttgatcgt gagaaaagaa xxggagxgax aaxgaaxxgg 
ictccaxcxa gxgggxxagg taagcgtgtc ccccttcccc axagaagxgg gxgggxgaga 
xxcxcccaig ggagagaxgg tatttttcgc aaxgaaaxca xxggggaxag tgcgatttta 

AGTGTTCTTC GAXXGGAXGA CGGTTTTGGA ATCTTTACTA GAGXGGGAXA XXXGAXAXCX 

ggaxxxggag xggxgxxaax gxaxagaxxx gaagaagagg xgxxxxxagg AXGGGXXGAG 
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' GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 
ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGGCGCACCT 
CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 
GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CACCCGAGGG GGATGATAAA 
CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 
GGGAAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGACGACC TATGATTATG 
TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 
CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 
AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 
TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 
GAACTTCCCC CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 
CTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 
GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT CAGAGAGATC 
CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 
GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 
GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG GTAAATATAA 
AATTTTTAAG TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 
CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGGAATGC CTTTAATCAC GAAAACCTGT 
TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCJAC TGCTGACTCT CAACATTCTA 
CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 
Gin i ll GAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 
CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 
TAAGTAGGCA TAACAGTTAT AATCATAACA TACTGTTTTT TCTTACTCCA CACAGGCATA 
GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 
AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 
ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCc' CCTGAACCTG 
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AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 5400 

AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 5460 

TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 5520 

TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 5580 

ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 5640 

GTTTTTCACA GACCGCTTTC TAAGGGTAAT TTTAAAATAT CTCGGAAGTC CCTTCCACTG 5700 

CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA ACAGCAGAAA CATACAAGCT 5760 

GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTCTT 5820 

AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 5880 

GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 5940 

CAAAACAGCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTGTAGCAT TTTTTGGGCT 6000 

TACAGTTTGA GCAGGATATT TGGTCCTGTA CTTTGCTAAC ACACCCTGCA GCTCCAAAGG 6060 

TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 6120 

TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 6180 

AGTTTTAACA GXAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 6240 

ATTAGGCAAA GGAA roc , 

6254 

(2) INFORMATION FOR SEQ ID NO: 11: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1442 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
GCTACCCAGC CTGCATAACC AGGAGGTGAG TGGCAGGTGA GTGAAATTTC ATCTGTAGTT 60 
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ACAGCCACTC CTCATCACTC GCATTACCAC CAGAGCTCCA CTCCCTGTCA GATCAGCGGC 120 

GGCATTAGAT TCTCATAGGA GCTCGAACCC TATTCTAAAC TGTTCATGTG AGGGATCTAG 180 

GTTGCAAGCT CCCTATGAGA ATCTAATGCC TGATGATCTG TCACGGTCTC CCATCACCCC 240 

TAGATGGGAC CATCTAGTTG CAGGAAAACA AGCTCAGGCT CCCACTGATT CTACACGATG 300 

GTGAATTGTG GAATTATTTC ATTATATATA TTACAATGTA ATAATAATAG AAATAAAGCA 360 

CACAATAAAT GTAATGTGCT TGAATCATCC CGAAACCATC CCACCCTGGT CTGTGAAAAA 420 

ATTGTCTTCC ATGAAACCAG TCCCTGGTGC CAAAAACGTT GAGGACCACT GCTCCACAGA 480 

ATCTATCGGT. CACTCTTCCT CCCCTCACCC CCTTGCCCTA AAAGCACACC CTGCAAACCT 540 

GCCATGAATT GACACTCTGT TTCTATCCCT TTTCCCCTTG TGTCTGTGTC TGGAGGAAGA 600 

GGATAAAGGA CAAGCTGCCC CAAGTCCTAG CGGGCAGCTC GAGGAAGTGA AACTTACACG 660 

TTGGTCTCCT GTTTCCTTAC CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA 720 

CCACCCCACC CAGCACACCT CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC 780 

TCAGGGGCAC AGAGAGAGTC TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC 840 

CGGGCACATG GCAGGGATGA GGGAAAGACC AAGAGTCCTC TGTTGGGCCC AAGTCCTAGA 900 

CAGACAAAAC CTAGACAATC ACGTGGCTGG CTGCATGCCT GTGGCTGTTG GGCTGGGCAG 960 

GAGGAGGGAG GGGCGCTCTT TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA CAGCC CTGGG 1020 

GGAAAACTTC CACGTTTTGA TGGAGGTTAT CTTTGATAAC TCCACAGTGA CCTGGTTCGC 1080 

CAAAGGAAAA GCAGGCAACG TGAGCTGTTT TTTTTTTCTC CAAGCTGAAC ACTAGGGGTC 1140 

CTAGGCTTTT TGGGTCACCC GGCATGGCAG ACAGTCAACC TGGCAGGACA TCCGGGAGAG . 1200 

ACAGACACAG GCAGAGGGCA GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG CTATTGGGGT 1260 

TTGCTCAATT GTTCCTGAAT GCTCTTACAC ACGTACACAC ACAGAGCAGC ACACACACAC 1320 

ACACACACAT GCCTCAGCAA GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT 1380 

TCAGACGGAC TCCCAGAGCC AGTGAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC 1440 

TG 1442 
(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 76i base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AAGCTTACCA TGGTAACCCC TGGTCCCGTT CAGCCACCAC CACCCCACCC 
CAACCTCAGC CAGACAAGGT TGTTGACACA AGAGAGCCCT CAGGGGCACA 
GGACACGTGG GGAGTCAGCC GTGTATCATC GGAGGCGGCC GGGCACATGG 
GGAAAGACCA AGAGTCCTCT GTTGGGCCCA AGTCCT AGAC. AGACAAAACC 
CGTGGCTGGC TGCATGCCTG TGGCTGTTGG GCTGGGCAGG AGGAGGGAGG 
CCTGGAGGTG GTCCAGAGCA CCGGGTGGAC AGCCCTGGGG GAAAACTTCC 
GGAGGTTATC TTTGATAACT CCACAGTGAC CTGGTTCGCC AAAGGAAAAG 
GAGCTGTTTT TTTTTTCTCC AAGCTGAACA CTAGGGGTCC TAGGCTTTTT 
GCATGGGAGA CAGTCAACCT GGCAGGACAT CCGGGAGAGA CAGACACAGG 
AAAGGTCAAG GGAGGTTCTC AGGCCAAGGC TATTGGGGTT TGCTCAATTG 
CTCTTACACA CGTACACACA CAGAGCAGCA CACACACACA CACACACATG 
TCCCAGAGAG GGAGGTGTCG AGGGGGACCC GCTGGCTCTT CAGACGGACT 
GTGAGTGGGT GGGGCTGGAA CATGAGTTCA TCTATTTCCT G 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS:. 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AAGCTTACCA TGGTAACCCC TGGTCCCGTT CAGCCACCAC CACCCCACCC AGCACACCTC 60 
CAACCTCAGC CAGACAAGGT TGTTGACACA AGAGAGCCCT CAGGGGCACA GAGAGAGTCT 120 
GGACACGTGG GGAGTCAGCC GTGTATCATC GGAGGCGGCC GGGCA 165 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE; NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGTTCATCTA TTTCCT 
(2) INFORMATION FOR SEQ ID NO:15: 



16 



(i) SEQUENCE CHARACTERISTICS: 

(A) . LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ- ID NO:15: 
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GTGGGGAGTC AGCCGTGTAT CATCG 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTCCAACCTC AGCCAGACAA GGTTGTTGAC ACAAGA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCCAGACAAG GTTGTTGACA CAAGA 
(2) INFORMATION FOR SEQ ID NO: 18: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CCCACATCTG GTATAAAAGG AGGCAGTGGC CCACAGAGGA GCACAGCTGT GTTTGGCTGC 60 
AGGGCCAAGA GCGCTGTCAA GAAGACCCAC ACGCCCCCCT CCAGCAGCTG AATTC 115 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii). MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGCCAGACGC CAACAAGGTA GGAGCTGGAG CATTCGGGCT GGGTTTCACC CCACCGCACG 60 
GAGGCCTTTT GGGGTGGAGC CCTCAGGCTC AGGGCATACT ACAAACTTTG CCAGCAAATC 120 
CGCCTCCTGC CTCCACCAAT CGCCAGTCAG GAAGGCAGCC TACCCCGCTG TCTCCACCTT 180 
TGAGAAACAC TCATCCTCAG GCCATGCAGT GG AATTC C AC AACCTTCCAC CAAACTCTGC 240 
AAGATCCCAG AGTGAGAGGC CTGTATTTCC CTGCTGGTGG CTCCAGTTCA GGAACAGTAA 300 
ACCCTGTTCT GACTACTGCC TCTCCCTTAT CGTCAATCTT CTCGA 345 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear " - 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TCGACCTCCA GGGATCTTTG TGAAGGAACC TTACTTCTGT GGTGTGACAT AATTGGACAA 
ACTACCTACA GAGATTTAAA GCTCTAAGGT AAATATAAAA TTTTTAAGTG TATAATGTGT 
TAAACTACTG ATTCTAATTG TTTGTGTATT TTAGATTCCA ACCTATGGAA CTGATGAATG 
GGAGCACTGG TGGAATGCCT TTAATGAGGA AAACCTGTTT TCCTCAGAAC AAATGCCATC 
TAGTGATGAT GAGGCTACTG CTGACTCTCA ACATTCTACT CCTCCAAAAA AGAAGAGAAA 
GCTAGAAGAC CCCAAGGACT TTCCTTCAGA ATTGCTAAGT TTTTTGAGTC ATCCTGTCTT 
TAGTAATAGA ACTCTTGCTT GCTTTGCTAT TTACACCACA AAGGAAAAAG CTGCACTGCT 
ATACAAGAAA ATTATGGAAA AATATTCTGT AACCTTTATA AGTAGGCATA ACAGTTATAA 
TGATAACATA CTGTTTTTTC TTACTCCACA CAGCCATAGA GTGTCTGCTA TTAATAACTA 
TGCTCAAAAA TTCTCTACCT TTACCTTTTT AATTTGTAAA GGGGTTAATA AGGAATATTT 
GATGTATAGT GCCTTGACTA GAGATCATAA TCAGCCATAC CACATTTGTA GAGGTTTTAC 
TTGCTTTAAA AAACCTCCCA CACCTCCCCC TGAACCTGAA ACATAAAATG AATGCAATTG 
TTGTTGTTAA CTTGTTTATT GCAGCTTATA ATGGTTACAA ATAAAGCAAT AGCATCACAA 
ATTTCACAAA TAAAGCATTT TTTTCACTGC ATTCTAGTTG TGGTTTGTCC AAACTCATCA 
ATGTATCTTA TCATGTCTGG ATCCGGCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT 
CCCCAGGCTC CCCAGCAGGC AGAAGTATGC AAAGCATGCA TCTCAATTAG TCAGCAACCA 
GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG. GCAGAAGTAT GCAAAGCATG CATCTCAATT 
AGTCAGCAAC CATAGTCCCG CCCCTAACTC CGCCCATCCC GCCCCTAACT CCGCCCAGTT 
CCGCCCATTC TCCGCCCCAT CCCTCACTAA TTTTTTTTAT TTATGCAGAG GCCGAGGCCG 
CCTCGGCCTC TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC CTAGGCTTTT 
GCAAAAAGCT TCACGCTGCC GCAAGCACTC AGGGCGCAAG GGCTGCTAAA GGAAGCGGAA 
CACGTAGAAA GCCAGTCCGC AGAAACGGTG CTGACCCCGG ATGAATGTCA GCTACTGGGC 
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TATCTGGACA AGGGAAAACG CAAGCGCAAA GAGAAAGCAG GTAGCTTGCA GTGGGCTTAC 1380 

ATGGCGATAG CTAGACTGGG GGGTTTTATG GACAGCAAGC GAACCGGAAT TGCCAGCTGG 1440 
GGCGCCCTCT GGTAAGGTTG GGAAGCCCTG CAAAGTAAAC TGGATGGCTT TCTTGCCGCC • 1500 

AAGGATCTGA TGGCGCAGGG GATCAAGATC TGATCAAGAG ACAGGATGAG GATCGTTTCG 1560 

CATGATTGAA CAAGATGGAT TGCACGCAGG TTCTCCGGCC GCTTGGGTGG AGAGGCTATT 1620 

CGGCTATGAC TGGGCACAAC AGACAATCGG CTGCTCTGAT GCCGCCGTGT TCCGGCTGTC 1680 

AGCGCAGGGG CGCCCGGTTC TTTTTGTCAA GACCGACCTG TCCGGTGCCC TGAATGAACT . 1740" 

GCAGGACGAG GCAGCGCGGC TATCGTGGCT GGCCACGACG GGCGTTCCTT GCGCAGCTGT 1800 

CCTCGACGTT GTCACTGAAG CGGGAAGGGA CTGGCTGCTA TTGGGCGAAG TGCCGGGGCA 1860 

GGATCTCCTG TCATCTCACC TTGCTCCTGC CGAGAAAGTA TCCATCATGG CTGATGCAAT 1920 

GCGGCGGCTG CATACGCTTG ATCCGGCTAC CTGCCCATTC CACCACCAAG CGAAACATCG 1980 

CATCGAGCGA GCACGTACTC CGATCGAACC CGGTCTTGTC GATCAGGATG ATCTGGACGA 2040 

AGAGCATCAG GGGCTCGCGC CAGCCGAACT GTTCGCCAGG CTCAAGGCGC GCATGCCCGA 2100 

CGGCGAGGAT CTCGTCGTGA CCCATGGCGA TGCCTGCTTG CCGAATATCA TGGTGGAAAA 2160 

TGGCCGCTTT TCTCGATTCA TCGACTGTGG CCGGCTGGGT GTGCCGGACC GCTATCAGGA 2220 

CATAGCGTTG GCTACCCGTG ATATTGCTGA AGAGCTTGGC GGCGAATGGG CTGACCGCTT 2280 

CCTCGTGCTT TACGGTATCC CCGCTCCCGA TTCGCAGCGC ATCGCCTTCT ATCGCCTTCT 2340 

TGACGAGTTC TTCTGAGCGG GACTCTGGGG TTCGAAATGA CCCACCAAGC GACGCCCAAC 2400 

CTGCCATCAC GAGATTTCGA TTCCACCGCC GCCTTCTATG AAAGGTTGGG CTTCGGAATC 2460 

GTTTTCCGGG ACGCCGGCTG GATGATCCTC CAGCGCGGGG ATCTCATGCT GGAGTTCTTC 2520 

GCCCACCCCG GGCTCGATCC CCTCGCGAGT TGGTTCAGCT GCTGCCTGAG GCTGGACGAC 2580 

CTCGCGGAGT TCTACCGGCA GTGCAAATCC GTCGCCATCC AGGAAACCAG CAGCGGCTAT 2640 

.CCGCGCATCC ATGCCCCCGA ACTGCAGGAG TGGGGAGGCA CGATGGCCGC TTTGGTCCCG 2700 

GATCTTTGTG AAGGAACCTT ACTTCTGTGG TGTGACATAA TTGGACAAAC TACCTACAGA 2760 

GATTTAAAGC TCTAAGGTAA ATATAAAATT TTTAAGTGTA TAATGTGTTA AACTACTGAT 2820 

TCTAATTGTT TGTGTATTTT AGATTCCAAC CTATGGAACT GATGAATGGG AGCAGTGGTG 2880 
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GAATGGGm AATGAGGAAA AGGXG^G aaaieu ^ 

GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AACAGAAACG TAGAAGACCC 
GAAGGAG TO CCTTCACAAT *„, maaciJ 

««« TTTGCTATTT ACACCACAAA GGAAAAAGCT CCACTCCTAT AGAAGAAmJ 

~- «~ gg^axaag .ggga.aag ag^aa* a^agT 
™t agxggagaga gggaxagag, gxgxggxau aataactatc ctcaaaaatt 

GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGX.AA.AAG GAA^GA J^Z 
A^ — ' ~ ~- «~ JZ 

IT ~* — ~ 

™gg agcttataat ggxxagaaax aaagcaatac gatcacaaa. ttcacaaata 

ATGTCTGGAT GCGGAGGAAG GXCG^G ^CA.AAA GGGTAAGG, JZ 

~ ™ ~- — — ~ 

CTTAACAAAA AGGAAATTGG GTAGGGCTTT rr,„ 

GGGTTT "CACAGACC GCTTTCTAAG GGTAATTTTA 

~ i™ ~ ~ — - CAC ; 
zr caagctgtca gctttgcaca agggcccaac ~ 

~ ™ C TGTGTTAGTA ATGTGCAAAA — ATTTTCCCCA 
~ ^ ~ «~ AACCCAGCAC 

~ r cATTATc cttatccma acagccttct cg ~ ~* 

-TCAACXG TACCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT CCTGTAGTTT 

mc cm ccaccmcag — — 

CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA ATGCAAGTTT 

AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACACTAA nrrr, 
• TTAACACTAA CAGCTTCCCA CATCAAAATA 

• TTTCCACACG TTAAGTCCTC ATTTAAATTA GGCAAACGAA TT 
(2) INFORMATION FOR SEQ ID NO: 21: . 6302 

(i) SEQUENCE CHARACTERISTICS: 
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3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
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(A) LENGTH: 6170 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv)ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCCTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

.AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGCC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TCTCAGACCA 1080 



• • 
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AGTmGTGA UTAtAGTTT A0AmAm AAAACTTCAT TTTTAATTTA AAAGGATCTA 
CGTCAAGATC CTTTTTGATA AIGIGATGAG CAAAATCCCI TAACGTGAGT tttggttgga 

ctgagcgtca gagggggiag aaaagaigaa aggatgttgt tgagatggtt tttttctccg 

CGTAATCTCC TGCTTGCAAA GAAAAAAAGG ACCGCTACCA GCGGTGGTTT GITTGGCGGA 

TCAAGAGCTA CCAACTCTTT TTGGGAAGGT aactgggttg AGGAGAGGGG agataggaaa 
IACTGTCCTT CTAGTGTAGC ' CGTAGTTAGG GGAGGACTTG AAGAAGTGTG IAGGAGGGGG 
TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCGAGTGGGG ATAAGTCGTG 
TGTTAGGGGG TTGGAGIGAA CACGATAGTT AGGGGAIAAG GGGGAGGGGT CGCGCTGAAC 
GGGGGGTTCC TGCACACAGC CCAGCTTGGA GGGAAGGAGG TAGAGGGAAG TGAGATACCT 
ACACCGTGAG CATTGAGAAA GGGGGAGGGI TGGGGAAGGG AGAAAGGCGG AGAGGTATGG 
GGTAAGCGGC AGGCTCGCAA CAGCAGACCG CACGAGGGAG CTTCCACGGG GAAACGCCTG 
GTATC1TTAT AGTCCTGTCG GGTTTGGGG. CCTCTCACTT GAGCGTCGAT TTTTCTGATG 
CTCGTCAGGC GGCCGGAGCC TATGGAAAAA CCCCAGCAAC CCGGCCTTTT TAGGGTTCGT 
GGCCTTTTCC TGGCCTTTTG CTGACATGTT GTTTCGTGGG TTATCCCCTG ATTCTGTGGA 
TAACCGTATT AGGGCGTTTG AGTGAGGTGA TACCGCTCGC GGGAGGGGAA GGAGGGAGGG 
CAGGGAGICA GTGAGGGAGG AACCGCAAGA GCGCCTCATG CGGTATTTTC TCC1TACGCA 
TCTGTGCCCT AITTCACACC GCATATCGTG CACTCTCAGT ACAATCTGCT GIGATGGGGC 
atagttaagc GAGIAIAGAG TCCGCTATCG giaggkagi GGGTGATGGG IGGGGCGGGA 
GAGGGGCGAA GAGGGGGTGA CCCGCCCTGA GGGGCITGTG TCCTCCCGGC ATCCGCTTAC 
AGAGMGGTG TGAGGGTCTG GGGGAGCIGG ATGTCTCAGA GCTTTTCACC GTCATCACCC 
AAAGGGGGGA GGGAGGGGAT CATAATCAGC CATACCACAT TTGTAGACCT TTTACTTGCT 
TTAAAAAAGG TCCCACACCT GGGGGIGAAG GTGAMGAIA AAATGAATGC AATTGTTGTT 

gttaacttgt ttaitggagg iiaiaaiggi tagaaataaa ggaaiagcat cagaaattic 

ACAAATAAAG GATTTTTTTC ACTCCATTCT ACTTCTGGTT TGTGGAAAGI GATGAAIGIA 
TGTTAtGAIG. TCTCGATCAT AATGAGGGAT ACCACArTTG TAGAGGTTTT AGTTGGTTTA 
AAAAACCTCC GAGACGTGGG CCTGAACCTG AAACATAAAA TGAATGCAAT IGTTGTTCTT 



1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
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2400 
2460 
2520 
2580 
2640 
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AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 
AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 



CCGAGCTCGA ATTCCAGCTG GCATTCCGGT ACTGTTGGTA AAATGGAAGA CGCCAAAAAC 2880 

ATAAAGAAAG GCCCGGCGCC ATTCTATCCT CTAGAGGATG GAACCGCTGG AGAGCAACTG 2940 

CATAAGGCTA TGAAGAGATA CGCCCTGGTT CCTGGAACAA TTGCTTTTAC AGATGCAGAT 3000 

ATCGAGGTGA ACATCACGTA CGCGGAATAC TTCGAAATGT CCGTTCGGTT GGCAGAAGCT 3060* 

ATGAAACGAT ATGGGCTGAA TACAAATCAC AGAATCGTCG TATGCAGTGA AAACTCTCTT 3120 

CAATTCTTTA TGCCGGTGTT GGGCG CGTTA TTTATCGGAG TTGCAGTTGC GCCCGCGAAC 3180 

GACATTTATA ATGAACGTGA ATTGCTCAAC AGTATGAACA TTTCGCAGCC TACCGTAGTG 3240 

TTTGTTTCCA AAAAGGGGTT GCAAAAAATT TTGAACGTGC AAAAAAAATT ACCAATAATC 3300 

CAGAAAATTA TTATCATGGA TTCTAAAACG GATTACCAGG GATTTCAGTC GATGTACACG 3360 

TTCGTCACAT CTCATCTACC TCCCGGTTTT AATGAATACG ATTTTGTACC AGAGTCCTTT 3420 

GATCGTGACA AAACAATTGC ACTGATAATG AATTCCTCTG GATCTACTGG GTTACCTAAG 3480 

GGTGTGGCCC TTCCGCATAG AACTGCCTGC GTCAGATTCT CGCATGCCAG AGATCCTATT 3540 

TTTGGCAATC AAATCATTCC GGATACTGCG ATTTTAAGTG TTGTTCCATT CCATCACGGT 3600 

TTTGGAATGT TTACTACACT CGGATATTTG ATATGTGGAT TTCGAGTCGT CTTAATGTAT 3660 

AGATTTGAAG AAGAGCTGTT TTTACGATCC CTTCAGGATT ACAAAATTCA AAGTGCGTTG 3720 

CTAGTACCAA CCCTATTTTC ATTCTTCGCC AAAAGCACTC TGATTGACAA ATACGATTTA 3780 

TCTAATTTAC ACGAAATTGC TTCTGGGGGC GCACCTCTTT CGAAAGAAGT CGGGGAAGCG 3840 

GTTGCAAAAC GCTTCCATCT TCCAGGGATA CGACAAGGAT ATGGGCTCAC TGAGACTACA 3900 

TCAGCTATTC TGATTACACC CGAGGGGGAT GATAAACCGG GCGCGGTCGG TAAAGTTGTT 3960 

CCATTTTTTG AAGCGAAGGT TGTGGATCTG GATACCGGGA AAACGCTGGG CGTTAATCAG 4020 

AGAGGCGAAT TATGTGTCAG AGGACCTATG ATTATGTCCG GTTATGTAAA CAATCCGGAA 4080 

GCGACCAACG CCTTGATTGA CAAGGATGGA TGGCTACATT CTGGAGACAT AGCTTACTGG 4140 

GACGAAGACG AACACTTCTT CATAGTTGAC CGCTTGAAGT CTTTAATTAA ATACAAAGGA 4200 



TATCATGTCT GGATCCCAAG CTTGCATGCC TGCAGGTCGA CTCTAGAGGA TCCCCGGGTA 



2820 
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TATOAGGTGG CCGCCGCTGA ATTGGAATCG ATATTGTTAC AACACCCCAA CATCTTCGAC 
GCGGGCGTGG CAGGTCTTCC CGACGATGAC GCCGGTGAAC TTCCCCCCGC CGTTGTTGTT 
TTGGAGCACG GAAAGACGAT GACGGAAAAA GAGATCGTGG ATTACGTCGC -GAGTCAAGTA 
ACAACCGCGA AAAAGTTGCG CGGAGGAGTT GTGTTTGTGG ACGAAGTACC GAAAGGTCTT 
ACGGGAAAAC TCGACGCAAG AAAAATCAGA GAGATCCTCA TAAAGGCCAA GAAGGGCGGA 
AAGTCCAAAT TGTAAAATGT AACTGTATTC AGCGATGACC AAATTCTTAG CTATTGTAAT 
GACTCTAGAG GATGTTTGTG AAGGAACCTT ACTTCTGTGG TGTGACATAA TTGGACAAAC 
TACCTACAGA GATTTAAAGC TCTAAGGTAA ATATAAAATT TTTAAGTGTA TAATGTGTTA 
AACTACTGAT TCTAATTGTT TGTGTATTTT AGATTCCAAC CTATGGAACT GATGAATGGG 
AGCAGTGGTG GAATGGCTTT AATGAGGAAA ACCTGTTTTG GTGAGAAGAA ATGCCATCTA 
GTGATGATGA GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AAGAGAAAGG 
XAGAAGACCC CAAGGACTTT CCTTCAGAAT TGCTAAGTTT TTTGAGTCAT GCTGTGTTTA 
GTAATAGAAC TCTTGCTTGC TTTGCTATTT ACACCACAAA GGAAAAAGCT GCACTGCTAT 

'ACAAGAAAAT iatggaaaaa tattgtgtaa cctttataag taggcataac agtxataatc 
ataacatact gttttttctt AGTGCAGACA ggcatagagt gtctggtatt aataactatg 

CTCAAAAATT GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGTTAATAAG GAATATTTGA 
TGTATAGTGC CTTGACTAGA GATCATAATG AGCCATACCA CATTTGTAGA GGTTTTAGTT 
GCTTTAAAAA ACCTGCCACA CCTCCCCCTG AACCTGAAAC ATAAAATGAA . TGCAATTGTT 
GTTGTTAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 
TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA ACTCATCAAT 
GTATCTTATC ATGTCTGGAT CCCCAGGAAG CTCCTCTCTG TCCTCATAAA CCCTAACCTC 
CTCTACTTGA GAGGACATTC CAATCATAGG CTGCCCATCC ACCCTCTGTG TCCTCCTGTT 
AATTAGGTCA CTTAACAAAA AGGAAATTGG GTAGGGCTTT TTCACAGACC GCTTTCTAAG 

GGTAATTTTA aaatatctgg gaagtccctt ccactggtgt gttccagaag TGTTGGTAAA 

CAGCCCACAA ATGTCAACAG CAGAAACATA CAAGCTGTCA GCTTTGCACA AGGGCCCAAC 
ACCCTGCTCA GCAAGAAGCA CTGTGGTTGC TGTGTTAGTA ATGTGCAAAA CAGGAGGCAC 



4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
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ATTTTCCCCA CCTGTGTAGG TTCCAAAATA TCTAGTGTTT TCATTTTTAC TTGGATCAGG 5820 

AACCCAGCAC TCCACTGGAT AAGCATTATC CTTATCCAAA ACAGCCTTGT GGTCAGTGTT 5880 

CATCTGCTGA CTGTCAACTG TAGCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT 5940 

CCTGTAGTTT GCTAACACAC CCTGCAGCTC CAAAGGTTCC CCACCAACAG CAAAAAAATG 6000 

AAAATTTGAC CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA 6060 

ATGCAAGTTT AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACAGTAA CAGCTTCCCA 6120 

CATCAAAATA TTTCCACAGG TTAAGTCCTC ATTTAAATTA GGCAAAGGAA 6170 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10533 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGG CAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGCC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 
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TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATGTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA . 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC ■ AACAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGC AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC CCATATGGTG CACTCTCAGT. ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 
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TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 
TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTC TGTATTTTAG ATTCCAACCT 2280 
ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 
CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT -TCTACTCCTC 2400 
CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 270.0 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGtTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACIAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGG C TTT TTT 3300 

GGAGG CCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT CCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 
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GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 


3780 


AT 


CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 


3840 


AC 


GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 


3900 


AT 


TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG * CTGCTATTGG 


3960 


TT 


GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 


4020 


A T 

AT 


TCATGGCTGA TGCAATGCGG CGG CTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 


4080 


A C 

AO 


ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 


4140 


AT 


AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 


4200 


AA 
Aft 


AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 


4260 


cc 


ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 


4320 


! xc 


CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 


4380 


GG 


AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 


4440 


CC 


CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 


4500 


AG' 


CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 


4560 


AT* 


GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 


4620 


AG' 


CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 


4680 


TT\ 


CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 


4740 


AAi 


AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 


4800 


CC( 


GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 


4860 


TT( 


ACAAACTA'CC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 


4920 


AC^ 


GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 


4980 


CT( 


AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 


5040 


tc: 


CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 


5100 


gg; 


GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 


5160 


AC( 


TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 


5220 


CAC 


TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 


5280 


m 
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ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 
ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 



TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TCGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC fTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTCTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 



ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 



5460 



1 
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-168- 



iuAiAAiGAi* ggaiaggaga IIlGiAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 


6900 


CA : 


iUGGGUluAA ogiijAAAOAI AAAA1GAA1G CAATTGTTGT TGTTAACTTG TTTATTGCAG 


6960 


TG 


GTTA1AATGG 1 1 AGAAAI AA AGCAATAGCA TCACAAATTT CACAAATAAA GCA'IT'i'TTTT 


7020 


CT 


^AUiuuAiiu iauiioioui IIGIGGAAAG 1GATCAATGT ATCTTATCAT GTCTGGATCC 


7080 


TG 


CACCCACATC TfiGTATAAAA fifJAPPr'ArTr C*rrT & P* A P" 1 A papPa CAnnT r*T*r+r*i**nr>r%i~*r* 
uav>uuauni u lUOlAlftflftA VjVjrtAjkjLiAVj 1 0 OL.UGALAGAG GAG GAG AG CT GTGTTTGGCT 


7140 


TC 


GCAGGGCCAA GAGCGCTGTC AAP.AAP,APPP ACirrrrrrr ptppappapp tpaattppap 

wwa^wwwwwva wavuwuiuiu /I/aU/VUjAVjL.L. AL»AL»OL>L>L-UU G 1 GGAGGAGG 1GAATTCCAG 


7200 


ga' ; 


CTGGCATTCC GGTACTGTTG GTAAAATPPA APAPPPPA A A AAPaTaaapa AArprrrppr 
* wiAAttAiVjbA A^AUOUUAAA AAGAiAAAGA AAGGGGGGGG 


7260 


GO 


GCCATTCTAT CCTCTAGAGG ATGGAAPPftP tppapappaa PTrrATAArr>TATpAAPAp 
*** * w ftiuuAftLiuou J.Vj\jAIjA<jGAA G 1GGA1AAGG GTATGAAGAG 


7320 


j 

AA( ] 


ATACGCCCTG GTTCCTGGAA CAATTGPTTT TAPApatppa patatppapp TPAArAT^Ap 
•A*A»wwww>4.w wiiwuiwvwn ufvnxiULriii lALiAGAiGGA GA1AJ.GGAGG iGAACATCAC 


^ n o n 

7380 


TGV ! 
i 


GTACGCGGAA TACTTCGAAA TGTf!PP.TTPP pttpppapaa p pt a tp aaat patatppppt 
*" WWWVMJ **" a a wutuui iuibuoi, 1UL> vjiiL»bUAbAA GC1AJ.GAAAG GAT AT G G G CT 


7440 


^ 1 
GT< j 


G AAT A C AAAT CACAGAATCG TP.PTATPPAP TP a a a a ptpt rrrr a a ttpt t*t« A 
wiunnwnnni wnunuArtl 1 UOiAlUUAU X IjAAAAL- 101 Gil GAA IT G 1 TTATGCCGGT 


7500 


AG( : 


GTTGGGCGCG TTATTTATPn P.APTTPPAPT Trmmmm aappapattt a«fa at^a a r*r+ 
iia;ixaiou uAu i l LruAG l 1GGGGGGGCG AACGACATTT ATAATGAACG 


7560 


GTj 


TGAATTGCTC AACAGTATPA APATTTPPPA rrrTArrrrA rTPTTTPTTT a a a a a r+r^r* 
iunniiuuAo xvnonvjxAiuA. /iOAiiiuuUA ouL» 1 AGGG 1A G 1 Gl IT GTTT GGAAAAAGGG 


7620 


m ; 


GTTGCAAAAA ATTTTP A A PP TPPAAAAAAA ATTAPruTH a — , — — a r-i a a a a tt« a »r*»r> a T</-t < «n 

unuwuwiA ivjaauij iooaaaaaaa AliAGCAATA ATCCAGAAAA TTATTATCAT 


7680 


GCl 


GGATTCTAAA A P. P.P ATT A PP APPPATTTPa PTrr*A tpt a n a ApmmppTr* a ^tTpntoA7<n<f 
uuaxxoxaaa auvj^ai a^uljAI i iga G1GGAIGTAC ACGTTCGTCA CATCTCATCT 


7740 


TT3 


APPTPPPPPT TTT A A TP A A T A PP A TTTTPT a r> r* a a /*»*t» r+ r+ tti'/< a t^/^^ti^ a^» «' » . — . 4 _ 

/*v*oj.uuv^ui iiiAAivjAAl AGGA1111GI ACCAGAGTCC TTTGATCGTG ACAAAACAAX 


7800 


TGC ■ 


TPPAPTP ATA ATPAATTPPT PTPPATPTap Tr+ryr*TT*K a a ^^%^>*¥»^»r»<n<-» ji.-i. unm, > _ 

iuwiuiUAiA AiuAAi i go 1 GlGGAiGiAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA 


7860 


AAA 


TAP«A APTP.PP TPPPTP A P A T TPTrrPATPP /^AOA/^A'Pf*/*'!' A T*»T»>f*T**T»^/t/^ a a T<s* t. a i i*i/«<>n 

x/»vjw*uiuoo ioultioaoai IGIGGUAIGC CAGAGATCCT AiilliGGCA ATCAAATCAT 


7920 


CTT 


TCCGGATACT ' CPPATTTTA A PTPTTPTTPP attppatpap nr* t r r r~T"rr>r* a a t^tw a r»n» a 
ivuuuniA^i uubAiiiiAA oioilGliGG AIIGGATGAC GGii.ij.GGAA TGTTTACTAC 


7980 


„ - i 

TTT i 


ACTCGGATAT TTGATATGTG GATTTCGAGT CGTCTTAATG TATAGATTTG AAGAAGAGCT 


8040 


i 

AGA 


GTTTTTACGA TCCCTTCAGG ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT 


8100 


ACA 


TTCATTCTTC GCCAAAAGCA CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT 


8160 


TGC 


TGCTTCTGGG GGCGCACCTC TTTCGAAAGA AGTCGGGGAA GCGGTTGCAA AACGCTTCCA 


8220 


TTT 


TCTTCCAGGG ATACGACAAG GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC 


8280 


GAT j- 

r 


ACCCGAGGGG GATGATAAAC CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGCGAA 


8340 


TTC j 


GGTTGTGGAT CTGGATACCG GGAAAACGCT GGGCGTTAAT CAGAGAGGCG AATTATGTGT 


8400 


AAA \ 



WO 95/19987 



PCT/US95/01153 



-169- ^ 

CAGAGGACCT ATGATTATGT CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT 8460 

TGACAAGGAT GGATGGCTAC ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAACACTT 8520 

CTTCATAGTT GACCGCTTGA AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC 8580 

TGAATTGGAA TCGATATTGT TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT 8640 

TCCCGACGAT GACGCCGGTG AACTTCCCGC CGCCGTTGTT GTTTTGGAGC ACGGAAAGAC 8700 

GATGACGGAA AAAGAGATCG TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT 8760 

GCGCGGAGGA GTTGTGTTTG TGGACGAAGT ACCGAAAGGT CTTACCGGAA AACTCGACGC 8820 

AAGAAAAATC AGAGAGATCC TCATAAAGGC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA 8880 

TGTAACTGTA TTCAGCGATG ACGAAATTCT TAGCTATTGT AATGACTCTA GAGGATCTTT 8940 

GT.GAAGGAAC CTTACTTCTG TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA 9000 

AGCTCTAAGG TAAATATAAA ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT 9060 

GTTTGTGTAT TTTAGATTCC AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGGAATGCC 9120 

TTTAATGAGG AAAACCTGTT TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT 9180 

GCTGACTCTC AACATTCTAC TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC 9240 

TTTCCTTCAG AATTGCTAAG TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT 9300 

TGCTTTGCTA TTTACACCAC AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA 9360 

AAATATTCTG TAACCTTTAT AAGTAGGCAT AACAGTTATA ATCATAACAT ACTCT TTTTT 9420 

CTTACTCCAC ACAGGCATAG AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC 9480 

TTTAGCTTTT TAATTTGTAA AGGGGTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT 9540 

AGAGATCATA ATCAGCCATA CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC 9600 

ACACCTCCCC CTGAACCTGA AACATAAAAT GAATGCAATT GTTCTTGTTA ACTTGTTTAT 9660 

TGCAGCTTAT AATGGTTACA AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT 9720 

TTTTTCACTG CATTCTAGTT GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG 9780 

GATCCCCAGG AAGCTCCTCT GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA 9840 

TTCCAATCAT AGGCTGCCCA.TCCACCCTCT GTGTCCTCCT GTTAATTAGG TCACTTAACA 9900 

AAAAGGAAAT TGGGTAGGGG TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC 9960 
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TGGGAAGTCC CTTCCACTGC TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA 


10020 


AG' 




CAGCAGAAAC ATACAAGCTG TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA 


1008O 


CD 




GCACTGTGGT TGCTGTGTTA GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT 


10140 


. TAt 




AGGTTCCAAA ATATCTAGTG TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG 


10200 


TG( 


j 

1 


GATAAGCATT ATCCTTATCC AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA 


10260 


CAJ 


CTGTAGCATT TTTTGGGGTT ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA 


10320 


AC( 


I 


CACCCTGCAG CTCCAAAGGT TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA 


10380 


An 


i 

y 


TGGGTTTTCC AGCACCATTT TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG 


10440 


GG/ 


CAGTTACCCC AATAACCTCA GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC 


10500 


TA/ 


\ 


AGGTTAAGTC CTCATTTAAA TTAGGCAAAG GAA 


10533 


TA/. 




(2) INFORMATION FOR SEQ ID NO: 23: 




AAA 




(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6229 base pairs 

(B) TYPE: nucleic acid 
CO STRANDEDNESS: double 
(D) TOPOLOGY: circular 




AG! 
GGT 
CTG 


\ 


(ii) MOLECULE TYPE: DNA (genomic) 




CGT 




(iii) HYPOTHETICAL: NO 




TCA 




(iv) ANTI-SENSE: NO 




TAC 
TAC 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:. 




TCT 


f 


TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 


60 


GGG' 


| 


AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 


120 


ACA'. 




TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 


180 


GGT. 




GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 


240 


GTA' 




TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 


300 


CTCt 


i 


AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 


360 


GGC< 




CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 


420 


TAA( 





S- 

I 



• 
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AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCX TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGGCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT ACTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 
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CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 GTGTGG< 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 TTGGCA, 

AXAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 TTGGAA' 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 GATTTG 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 TAGTAC i 

AAACGCGCGA GGCAGGGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 CTAATT 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 TTGCAA 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 CAGCT/ ) 

. j 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 CATTT3 \ 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 GAGGCC j 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 CGACCi :j 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 ACGAA* 

i 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 ATCAG 

TATCATGTCT GGATCCCACC CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 2820 CGGGC 

ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 2880 TGGAG 

AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 2940 CAACC 

TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 3000 CCGG/ 

ATAAGGCTAT GAAGAGATAC GCCCTGGTTC ' CTGCAACAAT TGCTTTTACA GATGCACATA 3060 AGTC< 

i 

TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 3120 ACTC 

TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 3180 ACCT. 

AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 3240 ACTA 

ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 3300 GCAG 

TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 3360 TGA1 

AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 3420 AGA/ 

TCGTCACATC TCATCTACCT CCCCGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 3480 TAA: 

ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 3540 CAAt 



I 
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GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 3600 

TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 3660 

TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 3720 

GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 3780 

TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT GATTGACAAA TACGATTTAT 3840 

CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGGAAGCGG 3900 

TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 3960 

CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 4020 

CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 4080 

GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG TTATGTAAAC AATCCGGAAG 4140 

CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC TGGAGACATA GCTTACTGGG 4200 

ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 4260 

ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 4320 

CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGCC GTTGTTGTTT 4380 

TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 4440 

CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 4500 

CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 4560 

AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 4620 

ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 4680 

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 4740 

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 4800 

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 4860 

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 4920 

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 4980 

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 5040 

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 5100 
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TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 5160 

TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 5220 

GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 5280 

CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 5340 

TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 5400 

TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 54 60 

TATCTTATCA TGTCTGGATC" CCCAGGAAGC TCCTCTGTGT CCTCATAAAC CCTAACCTCC 5520 

TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA CCCTCTGTGT CCTCCTGTTA 5580 

ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT TCACAGACCG CTTTCTAAGG 5640 

GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG TTCCAGAAGT GTTGGTAAAC 5700 

AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG CTTTGCACAA GGGCCCAACA 5760 

CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA TGTGCAAAAC AGGAGGCACA 5820 

TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT CATTTTTACT TGGATCAGGA 5880 

ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA CAGCCTTGTG GTCAGTGTTC 5940 

ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG TTTGAGCAGG ATATTTGGTC 6000 

CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC CACCAACAGC AAAAAAATGA 6060 

AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT GAGTTTTTTG TGTCCCTGAA 6120 

TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT TAACAGTAAC AGCTTCCCAC 6180 

ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG GCAAAGGAA 6229 
(2) INFORMATION FOR SEQ ID NO: 24: 



WO 9 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



TT 
AA 
TT 
GC 
TC 
AA 
CG 
AG 
CC 
TA 
TG 
CA 
AC 
AT 
GG 
TA 
TA 
AA 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10768 base pairs • AG 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double GG 

(D) TOPOLOGY: circular 



CT 
CG 
TC 
TA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GC ACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 
accaaacgac'gagcgtgaca CCACGATGCC XGCAGCAATG GCAACAACGT TGCGCAAACT . 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCCTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

ACTTTACTCA TATATACtTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 
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TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG~*ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG. GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA . CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTCTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 
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TTAGGGTGTG 


GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG 


CATGCATCTC 


3060 


AATTAGTCAG 


CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG 


AAGTATGCAA 


3120 


AGCATGCATC 


TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC 


CATCCCGCCC 


3180 


CTAACTCCGC 


CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT 


TTTTATTTAT 


3240 


GCAGAGGCCG 


AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG 


AGGCTTTTTT 


3300 


GGAGG CCTAG 


GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG 


CGCAAGGGCT 


3360 


GCTAAAGGAA 


GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA 


CCCCGGATGA 


3420 


ATGTCAGCTA 


CTGGG CTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA 


AAGCAGGTAG 


3480 


CTTGCAGTGG 


GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA 


GCAAGCGAAC 


3540 


CGGAATTGCC 


AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA 


GTAAACTGGA 


3600 


TGGCTTTCTT 


GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT 


CAAGAGACAG 


3660 


GATGAGGATC 


GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT 


CCGGCCGCTT 


3720 


GGGTGGAGAG 


GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC 


TCTGATGCCG 


3780 


CCGTGTTCCG 


GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAACACC 


GACCTGTCCG 


3840 


GTGCCCTGAA 


TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC 


ACGACGGGCG 


3900 


TTCCTTGCGC 


AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG 


CTGCTATTGG 


3960 


GCGAAGTGCC 


GGGGCAGGAT CTCCTCTCAT CTCACCTTGC TCCTGCCGAG 


AAAGTATCCA 


4020 


TCATGGCTGA 


TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC 


CCATTCGACC 


4080 


ACCAAGCGAA 


ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT 


CTTGTCGATC 


4140 


AGGATGATCT 


GGACGAAGAG CATCAGGGGC TCCCGCCAGC CGAACTGTTC 


GCCAGGCTCA 


4200 


AGGCGCGCAT 


GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC 


TGCTTGCCGA 


4260 


ATATCATGGT 


GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG 


CTGGGTGTGG 


.4320 


CGGACCGCTA 


TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG 


CTTGGCGGCG 


4380 


AATGGGCTGA 


CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG 


CAGCGCATCG 


4440 


CCTTCTATCG 


CCTTCTTGAC GAGTTCTTCT. GAGCGGGACT CTGGGGTTCG 


AAATGACCGA 


4500 


CCAAGCGACG 


CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCGT 


TCTATGAAAG 


4560 
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GTTGGGCTTC GGMTCCTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 AGTGCT 

CATCCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 TTTGG1 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 AAAATC 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 CCCTG, 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 TTCCC, 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 AC ACT I 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 CTGACi 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 TCTCO 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 GGATC. 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 ACCTC 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTC CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 CAGCT 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 TTTCA 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 TCATA 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 TCCCC 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 CTTA1 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 CACTC 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 CAGGC 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 CGGA< 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 TCCC 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 TTTG. 

'CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 GCAA 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTCTTC CAGAAGTGTT 5880 AAAC 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 ATTC 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 TCT/ 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 CCCT 

ATCAGGAACC CAGCACTCCA CTGCATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 CGGi 
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AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA' 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT- CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATGTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 
TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG . 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGGCCAGAC GCCAACAAGG TAGGAGCTGG AGCATTCGGG CTGGGTTTCA CCCCACCGCA 7140 

CGGAGGCCTT TTGGGGTGGA GCCCTCAGGC TCAGGGCATA CTACAAACTT TGCCAGCAAA 7200 

TCCGCCTCCT GCCTCCACCA ATCGCCAGTC AGGAAGGCAG CCTACCCCGC TGTCTCCACC 7260 

TTTGAGAAAC ACTCATCCTC AGGCCATGCA GTGGAATTCC ACAACCTTCC ACCAAACTCT 7320 

GCAAGATCCC AGAGTGAGAG GCCTGTATTT CCCTGCTGGT GGCTCCAGTT CAGGAACAGT 7380 

AAACCCTGTT CTGACTACTG CCTCTCCCTT ATCGTCAATC TTCTCGAAAT TCCAG CTGGC 7440 

ATTCCGGTAC TGTTGGTAAA ATGGAAGACG CCAAAAACAT AAAGAAAGGC CCGGCGCCAT 7500 

TCTATCCTCT AGAGGATGGA ACCGCTGGAG AGCAACTGCA TAAGGCTATG AAGAGATACG 7560 

CCCTGGTTCC TGGAACAATT GCTTTTACAG ATGCACATAT CGAGGTGAAC ATCACGTACG 7620 

CGGAATACTT CGAAATCTCC GTTCGGTTGG CAGAAGCTAT GAAACGATAT GGG CTGAATA 7680 
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CAAATCACAG AATCGTCGTA TGCAGTGAAA ACTCTCTTCA ATTCTTTATG CCGGTGTTGG 
GCGCGTTATT TATCGGAGTT GCACTTGCGC CCGCGAACGA CATTTATAAT GAACGTGAAT 
TGCTCAACAG TATGAACATT TCGCAGCCTA CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC 
AAAAAATtTT GAACGTGCAA AAAAAATTAC CAATAATCCA GAAAATTATT ATCATGGATT 
CTAAAACGGA TTACCAGGGA TTTCAGTCGA TGTACACGTT CGTCACATCT CATCTACCTC 
CCGGTTTTAA TGAATACGAT TTTGTACCAG AGTCCTTTGA TCGTGACAAA ACAATTGCAC 
TGATAATGAA TTCCTCTGGA TCTACTGGGT TACCTAAGGG TGTGGCCCTT CCGCATAGAA 
CTGCCTGCGT CAGATTCTCG CATGCCAGAG ATCCTATTTT TGGCAATCAA ATCATTCCGG 
ATACTGCGAT TTTAAGTGTT GTTCCATTCC ATCACGGTTT TGGAATGTTT ACTACACTCG 
GATATTTGAT ATGTGGATTT CGAGTCGTCT TAATGTATAG ATTTGAAGAA GAGCTGTTTT 
TACGATCCCT TCAGGATTAC AAAATTCAAA GTGCGTTGCT AGTACCAACC CTATTTTCAT 
TCTTCGCCAA AAGCACTCTG ATTGACAAAT ACGATTTATC TAATTTACAC GAAATTGCTT 
CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG GGGAAGCGGT TGCAAAACGC TTCCATCTTC 
CAGGGATACG ACAAGGATAT GGG CTCACTG AGACTACATC AGCTATTCTG ATTACACCCG 
AGGGGGATGA TAAACCGGGC GCGGTCGGTA AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG 
TGGATCTGGA TACCGGGAAA ACGCTGGGCG TTAATCAGAG AGGCGAATTA TGTGTCAGAG 
GACCTATGAT TATGTCCGGT TATGTAAACA ATCCGGAAGC GACCAACGCC TTGATTGACA 
AGGATGGATG GCTACATTCT GGAGACATAG CTTACTGGGA CGAAGACGAA CACTTCTTCA 
TAGTTGACCG CTTGAAGTCT TTAATTAAAT ACAAAGGATA TCAGGTGGCC CCCGCTGAAT 
TGGAATCGAT ATTGTTACAA CACCCCAACA TCTTCGACGC GGGCGTGGCA GGTCTTCCCG 
ACGATGACGC CGGTGAACTT CCCGCCGCCG TTGTTGTTTT GGAGCACGGA AAGACGATGA 
CGGAAAAAGA GATCGTGGAT TACGTCGCCA GTCAAGTAAC AACCGCGAAA AAGTTGCGCG 
GAGGAGTTGT GTTTGTGGAC GAAGTACCGA AAGGTCTTAC CGGAAAACTC GACGCAAGAA 
AAATCAGAGA GATCCTCATA AAGGCCAAGA AGGGCGGAAA GTCCAAATTG TAAAATGTAA 
CTGTATTCAG CGATGACGAA ATTCTTAGCT ATTGTAATGA CTCTAGAGGA TCTTTGTGAA 
GGAACCTTAC TTCTGTGGTG TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC 



77AH 


TAA 


7BnA 
/ oUU 


TGI 


TO Aft 

/o60 


TGA 


"7 Q O A 

7920 


CTC 


7980 


TTC 


8040 


TGC 


8100 


TTC 


8160 


TCC 


.8220 


CTT 


8280 


TCA 


8340 


TCC 


8400 


CTT 


8460 


CAC 


8520 


CCA 


8580 


ATC 


8640 


GAA 


8700 


AGT 


8760 


GA* 


ooZU 


GTC 


8880 


CCA 


8940 


CCA 


9000 
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TAAGGTAAAT ATAAAATTTT TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG 9300 

TGTATTTTAG ATTCCAACCT ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA 9360 

TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA 9420 

CTCTCAACAT TCTACTCCTC CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC 9480 

TTCAGAATTG CTAAGTTTTT TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT 9540 

TGCTATTTAC ACCACAAAGG AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA 9600 

TTCTGTAACC TTTATAAGTA GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC 9660 

TCCACACAGG CATAGAGTGT CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG 9720 

CTTTTTAATT TGTAAAGGGG TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA 9780 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 9840 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTCT TGTTAACTTG TTTATTGCAG 9900 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 9960 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 10020 

CCAGGAAGCT CCTCTGTGTC CTCATAAACC CTAACCTCCT CTACTTGAGA GGACATTCCA 10080 

ATCATAGGCT GCCCATCCAC CCTCTGTGTC CTCCTGTTAA TTAGGTCACT TAACAAAAAG 10140 

GAAATTGGGT AGGGGTTTTT CACAGACCGC TTTCTAAGGG TAATTTTAAA ATATCTGGGA 10200 

AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG TTGGTAAACA GCCCACAAAT CTCAACAGCA 10260 

GAAACATACA AGCTGTCAGC TTTGCACAAG GGCCCAACAC CCTGCTCAGC AAGAAGCACT 10320 

GTGGTTGCTG TGTTAGTAAT GTGCAAAACA GGAGG CACAT TTTCCCCACC TGTGTAGGTT 10380 

CCAAAATATC TAGTGTTTTC ATTriTACTT GGATCAGGAA CCCAGCACTC CACTGGATAA 10440 

GCATTATCCT TATCCAAAAC AGCCTTGTGG TCAGTGTTCA TCTGCTGACT GTCAACTGT A 10500 

GCATTTTTTG GGGTTACAGT TTGAGCAGGA TATTTGGTCC TGTAGTTTGC TAACACACCC 10560 

TGCAGCTCCA AAGGTTCCCC ACCAACAGCA AAAAAATGAA AATTTGACCC TTGAATGGGT 10620 

TTTCCAGCAC CATTTTCATG AGTTTTTTGT GTCCCTGAAT GCAAGTTTAA CATAGCAGTT 10680 

ACCCCAATAA CCTCAGTTTT AACAGTAACA GCTTCCCACA TCAAAATATT TCCACAGGTT 10740 

AAGTCCTCAT TTAAATTAGG CAAAGGAA 10768 
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(2) INFORMATION FOR SEQ ID NO: 25: AA/ 

(i) SEQUENCE CHARACTERISTICS: AG3 

(A) LENGTH: 6464 base pairs 

(B) TYPE: nucleic acid GG1 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular " CTC 
(ii) MOLECULE TYPE: DNA (genomic) CGI 

(iii) HYPOTHETICAL: NO TC/ 

(iv) ANTI-SENSE: NO TAC 

TAC 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO;25: TCI 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 GGC 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 AC/. 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 GGI 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 GT/ 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 CTC 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 GGC 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 TA/ 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 CAC 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 TGI 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 AT/ 

TGCGGCCAAC TTACTTGTGA CAACGATCGG AGGACCGAAG GAGCTAACCG C TTTTTT GCA 660 CAC 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 AG/ 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG. GCAACAACGT TGCGCAAACT 780 AA/ 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 TT/ 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 AC/ 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 TC: 
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AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

. AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTCGTT • TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 
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AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA 
AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA 
AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT 

tatcatgtct ggatcccagg ccagacgcca acaaggtagg 
gtttcacccc accgcacgga ggccttttgg ggtggagccc 
aaactttgcc agcaaatccg cctcctgcct ccaccaatcg 
ccccgctgtc tccacctttg agaaacactc atcctcaggc 
ccttccacca aactctgcaa gatcccagag tgagaggcct 
ccagttcagg aacagtaaac cctgttctga ctactgcctc 
cgaaattcca gctggcattc cggtactgtt ggtaaaatgg 

AAAGGCCCGG CGCCATTCTA TCCTCTAGAG GATGGAACCG 
GCTATGAAGA GATACGCCCT GGTTCCTGGA ACAATTGCTT 
GTGAACATCA CGTACGCGGA ATACTTCGAA ATGTCCGTTC 
CGATATGGGC TGAATACAAA TCACAGAATC GTCGTATGCA 
TTTATGCCGG TGTTGGGCGC GTTATTTATC GGAGTTGCAG 
TATAATGAAC GTGAATTGCT CAACAGTATG AACATTTCGC 
TCCAAAAAGG GGTTGCAAAA AATTTTGAAC GTGCAAAAAA 
ATTATTATCA TGGATTCTAA AACGGATTAC CAGGGATTTC 
ACATCTCATC TACCTCCCGG TTTTAATGAA TACGATTTTG 
GACAAAACAA TTGCACTGAT AATGAATTCC TCTGGATCTA 
GCCCTTCCGC ATAGAACTGC CTGCGTCAGA TTCTCGCATG 
AATCAAATCA TTCCGGATAC TGCGATTTTA AGTGTTGTTC 
ATGTTTACTA CACTCGGATA TTTGATATGT GGATTTCGAG 
. GAAGAAGAGC TGTTTTTACG ATCCCTTCAG GATTACAAAA 
CCAACCCTAT TTTCATTCTT CGCCAAAAGC ACTCTGATTG 
TTACACGAAA TTGCTTCTGG GGGCGCACCT CTTTCGAAAG 
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AAACGCTTCC ATCTTCCAGG GATACGACAA GGATATGGGC TCACTGAGAC TACATCAGCT 4200 

ATTCTGATTA CACCCGAGGG GGATGATAAA CCGGGCGCGG TCGGTAAAGT TGTTCCATTT 4260 

TTTGAAGCGA AGGTTGTGGA TCTGGATACC GGGAAAACGC TGGGCGTTAA TCAGAGAGGC 4320 

GAATTATGTG TCAGAGGACC TATGATTATG TCCGGTTATG TAAACAATCC GGAAGCGACC 4380 

AACGCCTTGA TTGACAAGGA TGGATGGCTA CATTCTGGAG ACATAGCTTA CTGGGACGAA 4440- 

GACGAACACT TCTTCATAGT TGACCGCTTG AAGTCTTTAA TTAAATACAA AGGATATCAG 4500 

GTGGCCCCCG CTGAATTGGA ATCGATATTG TTACAACACC CCAACATCTT CGACGCGGGC 4560 

. GTGGCAGGTC TTCCCGACGA TGACGCCGGT GAACTTCCCG CCGCCGTTGT TGTTTTGGAG 4620 

CACGGAAAGA CGATGACGGA AAAAGAGATC GTGGATTACG TCGCCAGTCA AGTAACAACC 4680 

GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT GTGGACGAAG TACCGAAAGG TCTTACCGGA 4740 

AAACTCGACG CAAGAAAAAT GAGAGAGATC CTCATAAAGG CCAAGAAGGG CGGAAAGTCC 4800 

AAATTGTAAA ATGTAACTGT ATTCAGCGAT GACGAAATTC TTAGCTATTG TAATGACTCT 4860 

AGAGGATCTT TGTGAAGGAA CCTTACTTCT GTGGTGTGAC ATAATTGGAC AAACTACCTA 4920 

CAGAGATTTA AAGCTCTAAG GTAAATATAA AATTTTTAAG TGTATAATGT GTTAAACTAC 4980 

TGATTCTAAT TGTTTGTGTA TTTTAGATTC CAACCTATGG AACTGATGAA TGGGAGCAGT 5040 

GGTGGAATGC CTTTAATGAG GAAAACCTGT TTTGCTCAGA AGAAATGCCA TCTAGTGATG 5100 

ATGAGGCTAC TGCTGACTCT CAACATTCTA CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG 5160 

ACCCCAAGGA CTTTCCTTCA GAATTGCTAA GTTTTTTGAG TCATGCTGTG TTTAGTAATA 5220 

GAACTCTTGC TTGCTTTGCT ATTTACACCA CAAAGGAAAA AGCTGCACTG CTATACAAGA 5280 

AAATTATGGA AAAATATTCT GTAACCTTTA TAAGTAGGCA TAACACTTAT AATCATAACA 5340 

TACTGTTTTT TCTTACTCCA CACAGGCATA GAGTGTCTGC TATTAATAAC TATGCTCAAA 5400 

AATTGTGTAC CTTTAGCTTT TTAATTTGTA AAGGGGTTAA TAAGGAATAT TTGATGTATA 5460 

GTGCCTTGAC TAGAGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 5520 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 5580 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 5640 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 5700 
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TATCATGTCT GGATCCCCAG GAAGCTCCTC TGTGTCCTCA TAAACCCTAA CCTCCTCTAC 5760 

TTGAGAGGAC ATTCCAATCA TAGGCTGCCC ATCCACCCTC TGTGTCCTCC TGTTAATTAG 5820 

GTCACTTAAC AAAAAGGAAA TTGGGTAGGG GTTTTTCACA GACCGCTTTC TAAGGGTAAT 5880 

TTTAAAATAT. CTGGGAAGTC CCTTCCACTG CTGTGTTCGA GAAGTGTTGG' TAAACAGCCC 5940 

ACAAATGTCA ACAGCAGAAA CATACAAGCT GTCAGCTTTG CACAAGGGCC CAACACCCTG 6000 

CTCAGCAAGA AGCACTGTGG TTGCTGTGTT AGTAATGTGC AAAACAGGAG GCACATTTTC 606Q 

CCCACCTGTG TAGGTTCCAA AATATCTAGT GTTTTCATTT TTACTTGGAT CAGGAACCCA 6120 

GCACTCCACT GGATAAGCAT TATCCTTATC CAAAACAGCC TTGTGGTCAG TGTTCATCTG 6180 

CTGACTGTCA ACTGTAGCAT TTTTTGGGGT TACAGTTTGA GCAGGATATT TGGTCCTGTA 6240 

GTTTGCTAAC ACACCCTGCA GCTCCAAAGG TTCCCCACCA ACAGCAAAAA AATGAAAATT 6300 

TGACCCTTGA ATGGGTTTTC CAGCACCATT TTCATGAGTT TTTTGTGTCC CTGAATGCAA 6360 

GTTTAACATA GCAGTTACCC CAATAACCTC AGTTTTAACA GTAACAGCTT CCCACATCAA 6420 

AATATTTCCA CAGGTTAAGT CCTCATTTAA ATTAGGCAAA GGAA 6464 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TGASTCA 7 
. (?) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 
. (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi> SEQUENCE DESCRIPTION: SEQ ID N0:27: 
TGGNNNNNNN GCCCAA 16 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
TGGCA 5 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
• (iii) HYPOTHETICAL: NO . 
(iv) ANTI- SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TGACACA 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii> MOLECULE TYPE: DNA (genomic) 
.(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TGAGTCA 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TGANACA 

7 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGATACA 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CCNTGTNT 
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WE. CLAIM: 

.1. A method for quantifying the amount of transforming 
growth factor-S (TGF-S) in a liquid sample; -which method 
comprises : 

(a) incubating said liquid sample together with 
eucaryotic cells that contain a TGF-S responsive expression 
vector having a. gene encoding lucif erase for a predetermined 

. time period sufficient for said eucaryotic cells to express a 
detectable amount of said lucif erase; 

(b) measuring the amount of said luciferase 
expressed during said time period; and 

(c) determining the amount of TGF-S present in 
said sample by comparing the measured amount of said luciferase 
against a reference curve. 

2. The method in accordance with claim 1 wherein the 
reference curve represents a series of measured amounts of said 
luciferase produced from a series of known concentrations of 
TGF-S by said eucaryotic cells. 

3 . The method in accordance with claim 1 wherein said 
eucaryotic cells are mammalian cells. 

4. The method in accordance with claim 3 wherein said 
mammalian cells are members of the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 
Hep3B cells, GM7373 cells, and NIH 3T3 cells. 

5. The method in accordance with claim 1 wherein the 
TGF-S responsive expression vector is a plasmid comprising, in 
the direction of transcription, a regulatory region that 
includes at least one TGF-S inducible response element- that is- 
operatively linked to a promoter, and a structural region 
downstream of said promoter, said response element being 
capable of inducing dose-dependent luciferase activity and said 
structural region coding for said luciferase. 

6. The method in accordance with claim 5 wherein said 
plasmid includes a nucleotide sequence that corresponds to a 
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sequence selected from the. group consisting of SEQ ID NOs 1-10. 

7. The method in accordance with claim 5 wherein said 
plasmid has the identifying characteristics of a plasmid 
selected from the group consisting of plasmid ATCC Accession 
Number 75627, plasmid ATCC Accession Number 14678 and plasmid 
ATCC Accession Number 75629, 

8. The method in accordance with claim. 5 wherein said 
TGF-6 inducible response element comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 
consisting of SEQ ID NOs 11-17 . 

9. The method in accordance with claim 5 wherein said 
promoter comprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 18 
and 19. 

10. The method in accordance with claim 1 wherein said 
eucaiyotic cells are stably transformed cells that contain said 
TGF-fc responsive vector, and wherein said vector also includes 

. a gene encoding a selectable marker. 

11. The method in accordance with claim 10 wherein said 
vector is a plasmid comprising a nucleotide sequence that 
corresponds to a sequence selected from the. group consisting of 
SEQ ID- NOs 1-6. 

12. The method in accordance with claim 1 wherein said 
eucaryotic cells are transiently transformed cells that contain 
said TGF-£ responsive vector, and wherein said vector is a 
plasmid comprising a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 7-10. 

13. The method in accordance with claim 1 wherein said 
liquid sample is selected from the group consisting of a body 
fluid, culture medium and a tissue extract. 

14. A method for quantifying the amount of transforming 
growth factor-S (TGF-fc) in a liquid sample comprising: 

(a) providing, in. eucaryotic cells capable of 
expressing an indicator molecule, a plasmid comprising, in the 
direction of transcription, a regulatory region that includes 
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18. The method in accordance with claim 14 wherein said 
nemmalian cell is selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese Hamster Ovary cells 
Hep3B cells, GM7373 cells and NIH 3T3 cells. 



The method in accordance with claim 14 wherein said 
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at least one TGF-S inducible response element that is 
operatively linked to a promoter, and a structural region 
downstream of said promoter, said. response element being 
capable of inducing dose-dependent indicator molecule activity 
5 and said structural region coding for said indicator molecule; 5 . pr 

(b) incubating said liquid sample with said 
eucaryotic cells for a predetermined time period sufficient for 
said eucaryotic cells to express a detectable amount of said 
indicator molecule; 

10 /„, Pi- ; 

(c) measuring the amount of said indicator 

molecule expressed during said time period; 'and 

(d) comparing the measured amount of said 
indicator molecule produced in step (c) with the amount of 
indicator molecule produced in a control assay performed 

15 .. according to steps (a) through (c) by treating said liquid 
sample with an anti-TGF-£ antibody to obtain a net measured 
amount of said indicator molecule induced by said TGF-S. 

15. The method in accordance with claim 14 wherein said 
-liquid sample contains an isoform of TGF-'S selected from the Dl 

-20 group consisting of TGF-S1, TGF-S2 and TGF-E3 

16. The method in accordance with claim 14 wherein said 
liquid sample is selected from the group consisting of a body 
fluid, culture medium and a tissue extract. 17. The method 
in accordance with claim 14 wherein said eucaryotic cell is a 

25 mammalian cell. 
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indicator molecule is luciferase. 

20. The method in accordance with claim 14 wherein said C o: t 
Plasnud comprises a nucleotide sequence that corresponds to a ' ! 
sequence selected from the group consisting of SEQ id NOs 1-10. sa ( 

21. The method in accordance with claim 14 wherein said 35 mo 1 
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TGF-& inducible response element comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 
consisting of SEQ ID NOs 11-17. 

22. The method in accordance with claim 14 wherein said 
5 promoter comprises a nucleotide sequence that* corresponds to a 

sequence selected from the group consisting of SEQ ID NOs 18 
and 19. 

23. The method in accordance with claim 14 wherein said 
plasmid has the identifying characteristics of a plasmid 

10 selected from the group consisting of plasmid ATCC Accession 

Number 75627, plasmid ATCC Accession Number 74628 and plasmid 

ATCC Accession Number 75629. 

.24. The method in accordance with claim 14 wherein said 

eucaryotic cells are stably transformed cells that contain said 
15 plasmid, and wherein said plasmid contains a gene encoding a 

selectable marker for the selection of said stably transformed 

cells. 

25. The method in accordance with claim 24 wherein said 
plasmid comprises a nucleotide sequence that corresponds to a 

20 sequence selected from the group consisting of SEQ ID NOs 1-6. 

26. The method in accordance with claim 14 wherein said 
eucaryotic cells are stably transformed cells that contain the 
TGF-S response element having the nucleotide sequence in SEQ ID 
NO 11, and wherein said cells correspond to cells on deposit 

25 with. ATCC having the ATCC Accession Number CRL 11508. 

27. The method in accordance with claim 14 wherein 
eucaryotic cells comprise transiently transformed cells that 
contain said plasmid comprising a nucleotide sequence that 
corresponds to a sequence selected from the group consisting of 

30 SEQ ID NOs 7-10. 

28. The method in accordance with claim 14 further 
comprising the step of: 

(e) determining the amount of said TGF-S present in 
said sample by comparing the measured amount of said indicator 
35 molecule obtained in step (d) against a reference curve. 
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29. The method in accordance with claim 28 wherein said c 

reference curve represents a series of measured amounts of said t : 

indicator molecule produced from a series of known 1 

concentrations of TGF-S in said eucaryotic cells. ^ a 

5 30. A plasmid vector in substantially pure form capable 5 f " 

of causing expression of an indicator molecule in a eucaryotic £ i 

cell, said plasmid including in the direction of transcription, 3 

a first nucleotide sequence comprising a regulatory region that 3 

includes at least one TGF-fi inducible response element c 

10 operatively linked to a promoter, a second nucleotide sequence 10 c ; 



coirprising a structural region downstream of said promoter and 1 \ 

coding for said indicator molecule, and a third nucleotide 

sequence comprising a gene encoding a selectable marker for the ( > 

selection of a stably transformed cell, said response element < = 

15 being capable of inducing dose-dependent luciferase activity - 15 ■ 1 .; 

and said structural region coding for said luciferase. 

31. The plasmid vector in accordance with claim 30 i . 
capable of expressing a chemiluminescent indicator molecule. 

32. The plasmid vector in accordance with claim 30 

20 wherein said plasmid comprises a nucleotide sequence that 20 
corresponds to a sequence selected from the group consisting of 

SEQ ID NOs 1-6. 

33.. The plasmid vector in accordance with claim 30 
wherein said TGF-fc inducible response element comprises a 
25 nucleotide sequence that corresponds to a sequence selected 25 -| 

from the group consisting of SEQ ID NOs 11-17. 

34. The plasmid vector in accordance with claim 30 \ 
wherein said promoter comprises a nucleotide sequence that 

corresponds to a sequence selected from the group consisting of 
30 SEQ ID NOs 18 and 19. 30 

35. The plasmid vector in accordance with claim 30 

wherein said gene comprises the nucleotide sequence in SEQ ID f 

NO 20. : 

36. A plasmid vector in substantially pure form and 

35 capable of causing expression of luciferase in a eucaryotic 35 

I 
I 
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cell, said plasmid comprising in the direction of 
transcription, a regulatory region that includes at lease one 
TGF-& inducible response element that is operatively linked to 
a promoter, and a structural region downstream of said promoter 
5 for transcription therefrom and coding for said lucif erase, 

said response element being capable of inducing dose-dependent 
lucif erase activity and said structural region coding for said 
luciferase, and wherein said plasmid has the identifying 
characteristics of a plasmid selected from the group consisting 
10 of plasmid ATCC Accession Number 75627, plasmid ATCC Accession 
Number 74628 and plasmid ATCC Accession Number 75629. 

37 . A plasmid vector in substantially pure form and 
capable of causing expression of lucif erase in a eucaryotic 
cell, said plasmid comprising in the direction of 

15 ■ transcription, a regulatory region that includes at least one 
TGF-E inducible response element that is operatively linked to 
a promoter, and a structural region downstream of said promoter 
for transcription therefrom and coding for said luciferase, 
said response element being capable of inducing dose-dependent 

20 luciferase activity and said structural region coding for said 
luciferase, and wherein said plasmid comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 
consisting of SEQ ID Nos 7-10. 

38. A eucaryotic cell containing a plasmid vector having 
25' a nucleotide sequence that corresponds to a sequence selected 

from the group consisting of SEQ ID NOs 1-10. 

39. The eucaryotic cell in accordance with claim 38 
wherein said cell is selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 

30 Hep3B cells, GM7373 cells and NIH 3T3 cells. 

40. A kit useful in assaying the amount of TGF-& in a 
liquid sample comprising (a) packaging material; (b) eucaryotic 
cells contained within said packaging material, said cells 
capable of expressing an indicator molecule and containing a 

35 plasmid comprising, in the direction of transcription, a 
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regulatory region that includes at least one TGF-fc inducible 
response element that is operatively linked to a promoter, and 
a structural region downstream of said promoter/ said response 
element being capable of inducing dose-dependent indicator 
molecule activity and said structural region coding for said 
indicator molecule; and (c) an aliquot of TGF-S contained 
within said packaging material, said TGF-E used for generating 
a reference curve representing a measured amount of the 
indicator molecule produced from a known concentration of TGF- 
S. . 

41. The kit in accordance with claim 40 wherein said 
eucaryotic cells are selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese Hamster Ovary cells, 
Hep3B cells, GM7373 cells and NIH 3T3 cells. 

42. The kit in accordance with claim 40 wherein said 
plasmid conprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 1-10. 

43. The kit in accordance with claim 40 wherein said 
plasmid comprises a plasmid having the identifying 
characteristics of a plasmid selected from the group consisting 
of plasmid ATCC Accession Number 75627, plasmid ATCC Accession 
Number 74628 and plasmid ATCC Accession Number 75629. 

44 . The kit in accordance with claim 40 wherein said 
packaging material comprises a label indicating that said 
eucaryotic cells can be. used for determining, the amount of TGF- 
£ in said liquid sartple comprising the steps of (a) incubating 
said cells with said liquid sample; (b) measuring the amount of 
said indicator molecule produced thereby; and (c) comparing the 
ainount of measured indicator molecule with said reference 
curve. 

45. The kit in accordance with claim 40 wherein said 
eucaryotic cells are stably transformed cells that contain the 
TGF-S response element having the nucleotide sequence in SEQ ID 
NO 11, and wherein said cells correspond to cells on deposit 
with ATCC having the ATCC Accession Number CRL 11508. 
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46. The kit in accordance with claim 40 further 
comprising: (d) an anti-TGF-B antibody for use in a parallel 
control assay for determining the amount of indicator molecule 
produced other than by TGF-fc induction. 
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